Phoenix TUG – Workbook Performance

Here is a copy of the presentation deck from the Phoenix Tableau User Group – April 2016 meeting. It was a pleasure to be able to speak to such an enthusiastic crowd of Tableau Users.

Other bits: 

  • You can find the code samples here.
  • Those who weren’t able to receive a free trial card, please let me know and I’ll get you one asap.
  • It’s all about the Tableau M.M.A 🙂

Understanding Tableau Server Workbook Performance by knowing Geography

A few weeks back, I did a webinar with Tableau and the Server Admin User Group and I chatted about our TWB Internals database. In that talk, I mentioned a couple of tables:

  • Workbook Performance
  • Workbook Metadata

Well, I’m happy to announce a third and final piece to the data model: Geo data. This is often a missing piece to the general ‘performance’ discussion on Tableau Server. What’s more, if your organization serves users both internally and externally, then performance becomes a big first impression. For example, you want to know why some users in various locations might mention a view is slow to load even though it’s been tested at, say, less than two seconds. Adding the geography (and ISP) of the people requesting the view makes the data (and workbooks) a bit easier to diagnose. BONUS, you can also understand the most requested views, http status codes (500s, etc.) and more.

ts-geo-perf-0
TWB Internals Database

Geo data: steps to implement

  • Pull the Apache logs (I used Logentries and its open API to pre-filter before exporting).
  • Parse the logs for basic columns: ip, datetime, request, port, etc
  • Get distinct list of IPs and lookup
  • Add to database (choose your flavor)
  • Automate

Connect the dots

ts-geo-perf-2
Parsing the Apache logs

The final table in the model had a few steps to it. It’s an IP helper table that stores the information for users who request views (city, state, latitude, longitude, etc.). Rather than look up each IP, I’ll check this table before and omit it from the lookup if we already have it stored. Once the basic script is complete, we automated it and then join the previous two tables together in order to understand performance of a view as it relates to geography. So, you’re SQL to join them all together might look something like this:

SELECT
w.site
,w.user
,w.workbook
,a.datetimemst
,a.ip
,a.httpstatus
,g.city
,g.region
,g.country
,g.loc
,sum(w.elapsed_sec) as load_time
FROM tableau_workbook_perf as w
JOIN tableau_apache_perf as a on w.req = a.requestid
JOIN tableau_apache_geo as g on g.ip = a.ip
GROUP BY
w.site
,w.user
,w.workbook
,a.ip
,a.httpstatus
,g.city
,g.region
,g.country
,g.loc

Assessing Performance

Now that we have all the pieces together, it’s time to visualize it back in Tableau (ah, yes, the circle of life, or data).  We can now take the highly viewed workbooks and understand if performance is a local/geographical issue and/or a time issue. When we combine with the TWB Internals metadata, well, we have a very complete picture of a workbooks value.

ts-geo-perf-1
Tracking view performance along with ISP / Geo data

Tableau Garbage Collector and Metadata Discovery

In the beginning, there were TWBs

Ok, so it’s not really ‘garbage’ because people put effort into designing and creating both their workbooks and Tableau Server data sources. What I’m talking about here is an outcome of a Self Service analytics environment: stuff just happens to land in projects/sites and it grows and grows.  While the work may have been used at one point, the workbooks/data sources consume valuable resources.

ts-g-collection-1
Our sample code snippet

 

In order to keep the environment swift and snappy, we’ll want to make sure we remove and archive the old materials. The beauty of this is that one query will accomplish two major avenues of management (Tableau Garbage Collection and Tableau Metadata Discovery).  In doing so, we’ll follow these steps:

  • Export old stuff from tableau (twb & tds)
  • Store on Amazon S3
  • Remove locally
  • Update channel/users via Slack

Tools needed:

  • Tableau Server admin (for Postgres access and REST API)
  • Slack (not necessary, but helpful for alerts/notification/collaboration)
  • Amazon AWS account (not necessary, but also helpful for storage)
  • PowerShell > v3

One Query to Rule Them All

This query will allow the admin to control/remove everything they need from Tableau Server. Here are the 11 fields:

  • Id: Id for the twb(x) or tds(x)
  • Last_view_time: max time the object was viewed
  • Luid: luid for twb(x) or tds(x)
  • Site_id: site id
  • Site_name: site name
  • Site_name_proper: site name for REST API
  • Site_luid: site luid
  • Export_name: name for export
  • HasExtract: whether or not there is a TDE
  • ExportType: datasources/workbooks endpoint for REST API
  • OID: ID for object in Postgres (lo_export)

Yes, that’s it. Now we’ll write a daily/weekly/monthly collection script that will clean up everything from Tableau Server.

ts-g-collection-0
Main fields for GC

Oh wait, it’s gone, now I suddenly need it

Ok, so perhaps we were a bit overzealous and removed too much. Have no fear, because this process also archives everything on Amazon S3. If it’s not on Tableau Server, it’s on Amazon and easy to recover and restore. Once we take a diff between what we have locally and what we have on S3, we’ll remove everything locally. Now, older, er, retro workbooks will forever be saved.

Send a Summary to the Admins: It’s full of TWBs

Since this is automated, we do keep a history of what was removed and, after this script runs, we send a summary to our Slack admin channel. This is the last step in the processing and gives info such as: total disk space recovered, total TWBs, etc

ts-g-collection-3
Sample data sent to Slack

The other side of the coin

You may have noticed the ‘OID’ field in the query. This is the easiest way to export metadata info from Tableau (ever try to download a 750MB workbook?). And combining this with what I’ve discussed here and here will make your life so much more fabulous. An your example workflow would be:

  • Export TWB / TWBX / TDS / TDSX
  • Parse metadata info (columns, rows, filters, and data sources).
  • Use Tableau class to export custom sql code

ts-g-collection-2

Tableau Server: Alert Gracefully

Some background

A while back, I wrote about how we make use of the background_jobs table in Tableau Server. You can see the code here. Although it did a great job with making our people aware of what failed (and when), it still arrived too frequently and was also sent via email. If you see something in email, it’s typically more resource intensive to take in all the bits of the message (who, when, where, what, etc.). Then you have to switch back to figure out where (in all your potential Tableau sites) the workbook/data source could be located. With the beauty of Slack and its well-documented API, we can now send the alerts via Slack messages.

Tools needed

  • Slack
  • Tableau (PostgreSQL access)
  • PowerShell

Why Alert Less?

Put simply, it avoids ‘alert fatigue’ and also gives just enough information for action; enough to glance at it and immediate know where to go and how to fix it. Rather than all hours throughout the day, this method just issues a direct message via Slack to each user. Further, it doesn’t embed an image or csv into an email (which would actually require more work). Bonus, we also remove the ‘com.tableausoftware.nativeapi.dll.DataSourceException:‘ portion of the error message. You would be surprised at how much people ignore when they are unsure about the message.

It’s a private message with four (and only 4) bits of information:

  • workbook or data source name
  • site
  • time of failure
  • friendly error message (we actually keep the most common and reply with those messages instead)
ts-alerts-0
sample Slack message to a user

The Core Query

This is it, seriously.

ts-alerts-1
The text enclosed in yellow is configurable to your environment

Getting the Slack Username/ID

This was the challenging part. The goal was to add an additional column to the query output above so when we did our notification loop in the code, we could dynamically send to the owner. This little bit made it pretty easy to just add their Slack ID to the Tableau query:

$t = curl -F token=”$($Slack)”  https://slack.com/api/users.list -k –silent

($t | ConvertFrom-Json).members | ForEach-Object -process { $sUser = $_.id ; $_} | select  @{n=’slack_username’;e={$sUser}},@{n=’profile_email’;e={$_.profile.email}}

Here’s the process for each day:

  • Split the day up into, wait for it, AM and PM.
  • Run the AM/PM code at some desired time
  • Notify each user via Slack, individually.

You may be wondering: what happens if you miss an extract during some gap? For example, what if someone’s extract failed at 11pm and we didn’t run the script until the next day? Pretty simple: if it’s run at 11pm and it is mission critical, it would be fixed at 11ish pm. We’re talking about daily (read: now) alerts. If I’m alerted today about something that happened yesterday, I’m less likely to do something about it. But there’s all sorts of psychology about that 🙂

 

Tableau Vapor Rub: Getting the goodness of Tableau, all over.

Some background:

While I had originally joked about the ‘Vapor Rub’ title, it has really, er, stuck to me at this point. It sort of describes the wonder of getting the right data into Tableau: once it’s there and properly formatted, everything just feels better.

vapor_rub_0
The ideal state.

Part I: Getting the data from all the places

The goal, then, is to get all the data we need for analysis into some sort of table (or tables). Ideally, it would be one table but when that’s not actually possible, we’ll create our own data model. If you’re organization has data in a multitude of places, it becomes challenging for Tableau developers to begin to acquire it and shape it into useful analysis without sacrificing some performance. Adding additional data sources can, depending on your approach, slow your workbook. Further, when we’re talking about massive data, well, performance will suffer it one doesn’t approach it efficiently.

vapor_rub_1

Part II: Processing the data by distributing the load (PowerShell Workflows) across computers

Here’s an example: pivoting data, a lot of data. We needed to pivot pipe delimited fields into their own columns, add to a database and then create a Tableau Server data source. A few caveats:

  • Needs to happen in sequence (part A needs to happen before part B)
  • It could be a long running task so if a part fails, I want the code to pick up where I left off
  • I need to distribute to load across computers because this is a lot of data

Now, grabbing the data via code isn’t that difficult of a task. We iron out our approach, sketch out some table or tables we might need to create and, hopefully, the data speaks. To do all this, we’ll use PowerShell Workflows.

Workflows are built around activities. These allow us to implement logic that wouldn’t necessarily fit within the function paradigm (though, they are related to some extent). Activities exist on their own, so ‘bucketing’ each piece of your code will assist a lot.

The difficult part is making sure we add the right checkpoint and split the load accordingly; we have a ton of files that need data pivoted and doing that work on one machine isn’t ideal. What’s more, if we do all the hard work of parsing and it doesn’t work or the computer crashes, we would need to start over. Having a checkpoint (via Checkpoint-Workflow) makes life a lot easier. At a very high level, our process will look like this:

  • Find files/data (S)
  • Do the parsing/splitting, etc (S)
  • Add to database (P)
  • Create Tableau data source (P)
  • Alert users (via Slack) (P/S)

NOTE: S = Sequence; P = Parallel

 

Part III: Adding consolidated data to a database

After doing the work of parsing and pivoting our data, we now need to add it to the database. This is also pretty straightforward if you’ve done it once. And the right connection string can make all the difference. As mentioned above, this will be done in parallel.

Part IV: Consume in Tableau

Having done the work mentioned above, making a Tableau Server data source is the easy part. We can dynamically create the extract or simply leverage or typical workflow (connect to data and publish to Server).

Other Applications:

In doing this, you might be wondering, what other ways can we leverage PowerShell Workflows along with Tableau Server. Here’s a few that work very well:

  • Tableau Extract Mover/Creator
  • Backups / Restores to various environments
  • Csv to TDE creator on a large scale
  • Creating custom reports/data files for Internal & External Users
  • A Vizable file delivery platform