Tableau Garbage Collector and Metadata Discovery

In the beginning, there were TWBs

Ok, so it’s not really ‘garbage’ because people put effort into designing and creating both their workbooks and Tableau Server data sources. What I’m talking about here is an outcome of a Self Service analytics environment: stuff just happens to land in projects/sites and it grows and grows.  While the work may have been used at one point, the workbooks/data sources consume valuable resources.

ts-g-collection-1
Our sample code snippet

 

In order to keep the environment swift and snappy, we’ll want to make sure we remove and archive the old materials. The beauty of this is that one query will accomplish two major avenues of management (Tableau Garbage Collection and Tableau Metadata Discovery).  In doing so, we’ll follow these steps:

  • Export old stuff from tableau (twb & tds)
  • Store on Amazon S3
  • Remove locally
  • Update channel/users via Slack

Tools needed:

  • Tableau Server admin (for Postgres access and REST API)
  • Slack (not necessary, but helpful for alerts/notification/collaboration)
  • Amazon AWS account (not necessary, but also helpful for storage)
  • PowerShell > v3

One Query to Rule Them All

This query will allow the admin to control/remove everything they need from Tableau Server. Here are the 11 fields:

  • Id: Id for the twb(x) or tds(x)
  • Last_view_time: max time the object was viewed
  • Luid: luid for twb(x) or tds(x)
  • Site_id: site id
  • Site_name: site name
  • Site_name_proper: site name for REST API
  • Site_luid: site luid
  • Export_name: name for export
  • HasExtract: whether or not there is a TDE
  • ExportType: datasources/workbooks endpoint for REST API
  • OID: ID for object in Postgres (lo_export)

Yes, that’s it. Now we’ll write a daily/weekly/monthly collection script that will clean up everything from Tableau Server.

ts-g-collection-0
Main fields for GC

Oh wait, it’s gone, now I suddenly need it

Ok, so perhaps we were a bit overzealous and removed too much. Have no fear, because this process also archives everything on Amazon S3. If it’s not on Tableau Server, it’s on Amazon and easy to recover and restore. Once we take a diff between what we have locally and what we have on S3, we’ll remove everything locally. Now, older, er, retro workbooks will forever be saved.

Send a Summary to the Admins: It’s full of TWBs

Since this is automated, we do keep a history of what was removed and, after this script runs, we send a summary to our Slack admin channel. This is the last step in the processing and gives info such as: total disk space recovered, total TWBs, etc

ts-g-collection-3
Sample data sent to Slack

The other side of the coin

You may have noticed the ‘OID’ field in the query. This is the easiest way to export metadata info from Tableau (ever try to download a 750MB workbook?). And combining this with what I’ve discussed here and here will make your life so much more fabulous. An your example workflow would be:

  • Export TWB / TWBX / TDS / TDSX
  • Parse metadata info (columns, rows, filters, and data sources).
  • Use Tableau class to export custom sql code

ts-g-collection-2

One thought on “Tableau Garbage Collector and Metadata Discovery

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s