Tableau Vapor Rub: Getting the goodness of Tableau, all over.

Some background:

While I had originally joked about the ‘Vapor Rub’ title, it has really, er, stuck to me at this point. It sort of describes the wonder of getting the right data into Tableau: once it’s there and properly formatted, everything just feels better.

vapor_rub_0
The ideal state.

Part I: Getting the data from all the places

The goal, then, is to get all the data we need for analysis into some sort of table (or tables). Ideally, it would be one table but when that’s not actually possible, we’ll create our own data model. If you’re organization has data in a multitude of places, it becomes challenging for Tableau developers to begin to acquire it and shape it into useful analysis without sacrificing some performance. Adding additional data sources can, depending on your approach, slow your workbook. Further, when we’re talking about massive data, well, performance will suffer it one doesn’t approach it efficiently.

vapor_rub_1

Part II: Processing the data by distributing the load (PowerShell Workflows) across computers

Here’s an example: pivoting data, a lot of data. We needed to pivot pipe delimited fields into their own columns, add to a database and then create a Tableau Server data source. A few caveats:

  • Needs to happen in sequence (part A needs to happen before part B)
  • It could be a long running task so if a part fails, I want the code to pick up where I left off
  • I need to distribute to load across computers because this is a lot of data

Now, grabbing the data via code isn’t that difficult of a task. We iron out our approach, sketch out some table or tables we might need to create and, hopefully, the data speaks. To do all this, we’ll use PowerShell Workflows.

Workflows are built around activities. These allow us to implement logic that wouldn’t necessarily fit within the function paradigm (though, they are related to some extent). Activities exist on their own, so ‘bucketing’ each piece of your code will assist a lot.

The difficult part is making sure we add the right checkpoint and split the load accordingly; we have a ton of files that need data pivoted and doing that work on one machine isn’t ideal. What’s more, if we do all the hard work of parsing and it doesn’t work or the computer crashes, we would need to start over. Having a checkpoint (via Checkpoint-Workflow) makes life a lot easier. At a very high level, our process will look like this:

  • Find files/data (S)
  • Do the parsing/splitting, etc (S)
  • Add to database (P)
  • Create Tableau data source (P)
  • Alert users (via Slack) (P/S)

NOTE: S = Sequence; P = Parallel

 

Part III: Adding consolidated data to a database

After doing the work of parsing and pivoting our data, we now need to add it to the database. This is also pretty straightforward if you’ve done it once. And the right connection string can make all the difference. As mentioned above, this will be done in parallel.

Part IV: Consume in Tableau

Having done the work mentioned above, making a Tableau Server data source is the easy part. We can dynamically create the extract or simply leverage or typical workflow (connect to data and publish to Server).

Other Applications:

In doing this, you might be wondering, what other ways can we leverage PowerShell Workflows along with Tableau Server. Here’s a few that work very well:

  • Tableau Extract Mover/Creator
  • Backups / Restores to various environments
  • Csv to TDE creator on a large scale
  • Creating custom reports/data files for Internal & External Users
  • A Vizable file delivery platform

Tableau Server: File Watching and Alerting

Some background:

As it happens, a lot of the fun Tableau Server ideas come from the need to optimize a person’s access to data. When the data isn’t correct or improperly formatted, Tableau development and analytics take a back seat to ETL work. Thankfully, new features in Tableau make data preparation so much easier and faster.

In the interim, however, there are often gaps between the unique circumstances each enterprise / business deals with. Put simply, data doesn’t always play nice.  A modern Analyst is one who can span the Tableau stack (data – analytics – people engineering – toolmaking). He or she isn’t shy about moving data, writing SQL and automating analytics pipelines.

That said, this solution takes all the trouble out of repeated steps and automates the process. We do the following:

  • Wait for a file to change
  • Upload to a database (choose your flavor)
  • Makes any matches against other data
  • Drop the results in a csv file
  • Ping Slack to update a group and/or individual that their file is ready for Tableau
  • Forward the results in a logging tool (choose your flavor)

 

Tools needed:

  • PowerGUI (needed to compile script)
  • Slack (not necessary, but helpful for alerts/notification/collaboration)
  • PowerShell > v3

 

Part 1: The Working File

This part is pretty easy. All it takes is a shared file (Dropbox or the like) that users will need to update with either an email address or some list or emails. NOTE: this is the only manual step in this process. It also doesn’t have to be an email (as you’ll see toward the end); it could be anything you want triggered to happen automatically (extracts, alerts, emails, reports, etc).

posh_as_service_3
The only manual part to this process

 

Part II: Wait and Watch (files)

This is the part that requires the most work. Since we are using the FileSystemWatcher class, we’ll want to ensure we’re specific on what we want to watch for and what type (change, create, delete). In our case, we are watching a directory and a specific file name for changes. Once those changes happen, we’ll run our script (see below for a sample). Your script should loop forever, albeit gracefully, and do the proper logging. Since we are using this class, we’ll rely on the $Event variable from our Action scriptblock.  It might help to log the event as it happens because we’ll lose it once it ends.

 

Part III: Compile the Script & Install Service

This part, in addition to Part I, is pretty simple. Once you have your script settled and functional, it’s just a matter of choosing an option in PowerGUI.

Once you’ve compiled the script, you can either use the New-Service PowerShell cmdlet or just choose ‘Compile Script into Service’ from the PowerGUI menu. If you choose that option, all that’s needed from that point is a: <someCompiledScript.exe> -Service Install

 

Part IV: Integrate with Slack

While not absolutely necessary, this part will update a user (or group) that their file is ready for analysis in Tableau. You could also add the file but that might be a bit too much since we’ll drop the file back where the user updated (see Part I). It’s a simple API call to the chat.postMessage method.

posh_as_service_1
Sample alert to Slack

Other Use Cases:

There are plenty of ways in which a FileSytemWatcher script as service can be useful. Here are a few that we’ve done:

  • Immediate Tableau Server password reset (see how we’ve done this here)
  • User / data lookup against various databases
  • Extract triggers (done at the end of an analytics pipeline)
  • ETL that is Workflow related (post coming soon)
  • Automatic backup & restore on development/beta machines
  • Tableau Server ‘garbage collection’ (removing old stuff)
  • Automated TWB Internals application (drop a twb in a directory and output all the details)