Understanding Tableau Server Workbook Performance by knowing Geography

A few weeks back, I did a webinar with Tableau and the Server Admin User Group and I chatted about our TWB Internals database. In that talk, I mentioned a couple of tables:

  • Workbook Performance
  • Workbook Metadata

Well, I’m happy to announce a third and final piece to the data model: Geo data. This is often a missing piece to the general ‘performance’ discussion on Tableau Server. What’s more, if your organization serves users both internally and externally, then performance becomes a big first impression. For example, you want to know why some users in various locations might mention a view is slow to load even though it’s been tested at, say, less than two seconds. Adding the geography (and ISP) of the people requesting the view makes the data (and workbooks) a bit easier to diagnose. BONUS, you can also understand the most requested views, http status codes (500s, etc.) and more.

ts-geo-perf-0
TWB Internals Database

Geo data: steps to implement

  • Pull the Apache logs (I used Logentries and its open API to pre-filter before exporting).
  • Parse the logs for basic columns: ip, datetime, request, port, etc
  • Get distinct list of IPs and lookup
  • Add to database (choose your flavor)
  • Automate

Connect the dots

ts-geo-perf-2
Parsing the Apache logs

The final table in the model had a few steps to it. It’s an IP helper table that stores the information for users who request views (city, state, latitude, longitude, etc.). Rather than look up each IP, I’ll check this table before and omit it from the lookup if we already have it stored. Once the basic script is complete, we automated it and then join the previous two tables together in order to understand performance of a view as it relates to geography. So, you’re SQL to join them all together might look something like this:

SELECT
w.site
,w.user
,w.workbook
,a.datetimemst
,a.ip
,a.httpstatus
,g.city
,g.region
,g.country
,g.loc
,sum(w.elapsed_sec) as load_time
FROM tableau_workbook_perf as w
JOIN tableau_apache_perf as a on w.req = a.requestid
JOIN tableau_apache_geo as g on g.ip = a.ip
GROUP BY
w.site
,w.user
,w.workbook
,a.ip
,a.httpstatus
,g.city
,g.region
,g.country
,g.loc

Assessing Performance

Now that we have all the pieces together, it’s time to visualize it back in Tableau (ah, yes, the circle of life, or data).  We can now take the highly viewed workbooks and understand if performance is a local/geographical issue and/or a time issue. When we combine with the TWB Internals metadata, well, we have a very complete picture of a workbooks value.

ts-geo-perf-1
Tracking view performance along with ISP / Geo data

Tableau Server: File Watching and Alerting

Some background:

As it happens, a lot of the fun Tableau Server ideas come from the need to optimize a person’s access to data. When the data isn’t correct or improperly formatted, Tableau development and analytics take a back seat to ETL work. Thankfully, new features in Tableau make data preparation so much easier and faster.

In the interim, however, there are often gaps between the unique circumstances each enterprise / business deals with. Put simply, data doesn’t always play nice.  A modern Analyst is one who can span the Tableau stack (data – analytics – people engineering – toolmaking). He or she isn’t shy about moving data, writing SQL and automating analytics pipelines.

That said, this solution takes all the trouble out of repeated steps and automates the process. We do the following:

  • Wait for a file to change
  • Upload to a database (choose your flavor)
  • Makes any matches against other data
  • Drop the results in a csv file
  • Ping Slack to update a group and/or individual that their file is ready for Tableau
  • Forward the results in a logging tool (choose your flavor)

 

Tools needed:

  • PowerGUI (needed to compile script)
  • Slack (not necessary, but helpful for alerts/notification/collaboration)
  • PowerShell > v3

 

Part 1: The Working File

This part is pretty easy. All it takes is a shared file (Dropbox or the like) that users will need to update with either an email address or some list or emails. NOTE: this is the only manual step in this process. It also doesn’t have to be an email (as you’ll see toward the end); it could be anything you want triggered to happen automatically (extracts, alerts, emails, reports, etc).

posh_as_service_3
The only manual part to this process

 

Part II: Wait and Watch (files)

This is the part that requires the most work. Since we are using the FileSystemWatcher class, we’ll want to ensure we’re specific on what we want to watch for and what type (change, create, delete). In our case, we are watching a directory and a specific file name for changes. Once those changes happen, we’ll run our script (see below for a sample). Your script should loop forever, albeit gracefully, and do the proper logging. Since we are using this class, we’ll rely on the $Event variable from our Action scriptblock.  It might help to log the event as it happens because we’ll lose it once it ends.

 

Part III: Compile the Script & Install Service

This part, in addition to Part I, is pretty simple. Once you have your script settled and functional, it’s just a matter of choosing an option in PowerGUI.

Once you’ve compiled the script, you can either use the New-Service PowerShell cmdlet or just choose ‘Compile Script into Service’ from the PowerGUI menu. If you choose that option, all that’s needed from that point is a: <someCompiledScript.exe> -Service Install

 

Part IV: Integrate with Slack

While not absolutely necessary, this part will update a user (or group) that their file is ready for analysis in Tableau. You could also add the file but that might be a bit too much since we’ll drop the file back where the user updated (see Part I). It’s a simple API call to the chat.postMessage method.

posh_as_service_1
Sample alert to Slack

Other Use Cases:

There are plenty of ways in which a FileSytemWatcher script as service can be useful. Here are a few that we’ve done:

  • Immediate Tableau Server password reset (see how we’ve done this here)
  • User / data lookup against various databases
  • Extract triggers (done at the end of an analytics pipeline)
  • ETL that is Workflow related (post coming soon)
  • Automatic backup & restore on development/beta machines
  • Tableau Server ‘garbage collection’ (removing old stuff)
  • Automated TWB Internals application (drop a twb in a directory and output all the details)