Tableau Vapor Rub: Getting the goodness of Tableau, all over.

Some background:

While I had originally joked about the ‘Vapor Rub’ title, it has really, er, stuck to me at this point. It sort of describes the wonder of getting the right data into Tableau: once it’s there and properly formatted, everything just feels better.

vapor_rub_0
The ideal state.

Part I: Getting the data from all the places

The goal, then, is to get all the data we need for analysis into some sort of table (or tables). Ideally, it would be one table but when that’s not actually possible, we’ll create our own data model. If you’re organization has data in a multitude of places, it becomes challenging for Tableau developers to begin to acquire it and shape it into useful analysis without sacrificing some performance. Adding additional data sources can, depending on your approach, slow your workbook. Further, when we’re talking about massive data, well, performance will suffer it one doesn’t approach it efficiently.

vapor_rub_1

Part II: Processing the data by distributing the load (PowerShell Workflows) across computers

Here’s an example: pivoting data, a lot of data. We needed to pivot pipe delimited fields into their own columns, add to a database and then create a Tableau Server data source. A few caveats:

  • Needs to happen in sequence (part A needs to happen before part B)
  • It could be a long running task so if a part fails, I want the code to pick up where I left off
  • I need to distribute to load across computers because this is a lot of data

Now, grabbing the data via code isn’t that difficult of a task. We iron out our approach, sketch out some table or tables we might need to create and, hopefully, the data speaks. To do all this, we’ll use PowerShell Workflows.

Workflows are built around activities. These allow us to implement logic that wouldn’t necessarily fit within the function paradigm (though, they are related to some extent). Activities exist on their own, so ‘bucketing’ each piece of your code will assist a lot.

The difficult part is making sure we add the right checkpoint and split the load accordingly; we have a ton of files that need data pivoted and doing that work on one machine isn’t ideal. What’s more, if we do all the hard work of parsing and it doesn’t work or the computer crashes, we would need to start over. Having a checkpoint (via Checkpoint-Workflow) makes life a lot easier. At a very high level, our process will look like this:

  • Find files/data (S)
  • Do the parsing/splitting, etc (S)
  • Add to database (P)
  • Create Tableau data source (P)
  • Alert users (via Slack) (P/S)

NOTE: S = Sequence; P = Parallel

 

Part III: Adding consolidated data to a database

After doing the work of parsing and pivoting our data, we now need to add it to the database. This is also pretty straightforward if you’ve done it once. And the right connection string can make all the difference. As mentioned above, this will be done in parallel.

Part IV: Consume in Tableau

Having done the work mentioned above, making a Tableau Server data source is the easy part. We can dynamically create the extract or simply leverage or typical workflow (connect to data and publish to Server).

Other Applications:

In doing this, you might be wondering, what other ways can we leverage PowerShell Workflows along with Tableau Server. Here’s a few that work very well:

  • Tableau Extract Mover/Creator
  • Backups / Restores to various environments
  • Csv to TDE creator on a large scale
  • Creating custom reports/data files for Internal & External Users
  • A Vizable file delivery platform

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s