Tableau Permissions: Automated

Here’s the slide deck from the Tableau Admin group meeting. Please let me know if there are questions. A few things to note:

1.) There is a complete ‘User’ module that does the following:

  • add/remove user via local auth
  • add/remove user via AD
  • add/remove groups

2.) These are meant to be used together. For example, you’d first add a group, then run the Set-TsPermission function. Oh, and, it also removes the ‘All Users’ group permissions.

And, again, part of permissions is the monitoring aspect: if you’re not aware of what’s happening, it’s very hard to adjust and enable permissions in a way that is effective.

 

Use Tableau dashboards and write data to a database

That’s right! Enter any value in your Parameter field(s) and have data written to a database (choose your flavor).  Keep history, study filter trends (and timing), leverage it as a form-enabler and/or data analytics tool. Now Tableau *can* take user input and write that input to a database.

I’ll let that soak in…Filter/Parameter values being written back to a database.

tableau-database - Page 1

Okay, here’s how to do it.

Step 1

Make a Dashboard and toss some Parameters on it (can be any data type). NOTE: you can also use filters because that data is also logged. But, parameters are fun because so many people think they aren’t dynamic.  It’s time to prove they are, in fact, very dynamic.

Step 2

Use the logs.

Every event performed on a dashboard/workbook is logged. It’s just a matter of finding the right key/value pair, parsing, and then adding to a database. For example, if I have a free-form parameter value and I enter, ‘2,000,000’, it is logged along with the parameter number (note this is different than your parameter caption) as well as user, session, request, and a bunch of other stuff.

This means you can keep a history of who changed the parameter, the old value, the new value, and a whole bunch of other stuff.

Step 3

Create some tables in your database. (NOTE: you only do this 1x)

This is simple if you just want to keep a log of the changing parameter values. If, on the other hand, you want to keep performance metrics, track which workbook/sheets are being used, then I’d recommend you also parse and create analysis on the Apache logs. I’ve talked about this a lot here, here, here and here.

Step 4

Parse the logs and add to the tables.

Simple since the main log is in json format. Use some PowerShell and you can do this in a simple function/module: 1 for pulling the log data and 1 for adding to a database.

If you have a log analytics tool in place, this becomes insanely easy since all your logs are centralized (they are, right?!). If not, you can just parse the log file in the default directory.

Step 5

Enjoy!

Use Cases

  • As a light-CRM
  • Data ‘picker’ tool for template dashboards (for example: choose between line/bar chart, etc and data gets written to database and picked up by another script that makes the workbook for you, based on those choices).
  • Replacement for embedded forms
  • Full cycle analytics (from data entry to data analysis, quantified self stuff)
  • Triggers (for example: User A picks Parameter B and, a csv and/or png file is delivered)
  • Usage data about dashboards (for example: if you create 15 parameters and users only ever use 2, then you can remove the unnecessary stuff).
  • Permission approval form (for example: User A request permission to Dashboard B)
  • Subscription forms
  • Form for filtered queries (for example: use fields to help analysts who might not know SQL to create custom SQL based on set fields/values)
  • And more!

What is Analytics Engineering?!

“It is easier to ship recipes than cakes and biscuits.” -John Maynard Keynes

ae-1

As an Analytics Engineer, you may hear this a lot: ‘Wait, so you just make dashboards and reports, right?’ It’s common but often based off a deep misunderstanding about the analytics process. Let me attempt to clarify.

As an Analytics Engineer, you are responsible for moving the analytics and data processes into the layer above the data warehouse. Essentially, your task is to create something from the basic set of data and shape it into something both actionable and useful.  Naturally, this task lends itself to a broad and narrow set of skills. And it is probably why people are so quick to assume you just ‘make reports and dashboards.’ They see the result of the work, not the process that created that report and dashboard. They did not see you toil with a dataset too wide and at the wrong granularity. They did not see you automate the ETL of said table into an aggregated table with security, alerting and logging baked into the process. And, after all that, they did not see you worry about font, color, and styling of the dashboard. Because the effects of a terrible design are catastrophic for your message.

An Analytics Engineer must be, primarily, a technologist and thinker. With the sheer volume of options available (throughout the Analytics and Data layers), context and task shifting are an understatement. It’s common to spend most of the day at the data (SQL) layer, while spending a smaller portion of time creating and implementing a dashboard tool. And it’s that balance of time which allows for functional dashboards.

One vital piece that is worth mentioning (and possibility elaborating on) is how this field is not solely locked into one technology or stack (for example: LAMP stack, cloud vendors or analytics tools); rather, this field should be able to use all the tools and should leverage its own, more abstract, stack unrelated to a technology. So, today it might be MySQL but tomorrow it could be Redshift or the like. Therein rests the challenge: being able to adapt to the tool at hand without missing a beat.

 

ae-2

Analytics as a Service: Tableau Rule 1 – Infrastructure

 

ts-cloud-element-1

Leverage the cloud

Elasticity

Don’t be constrained or afraid to combine pieces of cloud technologies to serve the Analytics structure.

Become durable / resilient

Just in case

Even though there are very high monthly uptime percentages, just be prepared for something to break. If you do that, you’ll have even *more* creative freedom (crazy, huh?).

Choose to: (1) scale laterally or (2) scale vertically

This is all about re-framing the question around Projects vs Sites.

Why you have Sites over Projects or Projects over Sites? And that can’t be the only choice, right? (Hint: it’s not the only choice)

I’ve seen benefits to both but the extra work involved with Sites make scaling laterally (Sites) much more difficult than vertically (Projects), not to mention the challenges of stepping into the compliance realm.

Eliminate waste

Storage

Remove all the pieces from your base install that can be done elsewhere (eg: collect the ‘garbage’ but store on AWS S3 with a good lifecycle policy). That way, your Analytics infra is light and fast.

I challenge you to think *bigger* with Tableau.  How can you provide more fluid access to insight than anything else?

-Mike

 

Preserve Order (CSV Order in Tableau)

Ever created a wonderful Tableau dashboard with the added ‘Export to CSV’ functionality? We all have. Click the super-sleek Excel icon and, viola, the download begins. Send the file, walk away and think: ‘my, was that cool.’

ts-csv-twb
Nice and ordered.

But wait. You get an email complaining about column order. For some reason, the columns you’ve added, perfectly, are all messed up. In fact, some would say they’re in alphabetical order. What the?!

ts-csv-file
Oh come on.

Anyway, here’s an easy PowerShell function that will fix that and, send the email with the columns in the correct order.

 

Tableau Server – Revision History, er, revisited.

Problem

There are plenty of ways to make a good backup of your analytics content and the options available on Tableau Server are numerous. But, and here’s the better question: are they efficient and redundant enough?

Yes, current Server versions allow for n number of days of backup for content but this slowly increases the size of your backup and storage (and puts too many eggs in one basket). Plus, there’s no effective way to turn this option off if you have multiple sites (at least that I’m aware of). What’s more, you can get away with a great daily backup strategy and subsequent (automatic) restore to your development machine.

ts-content-backup-rev-1
This could escalate quickly.

To add to the complexity, what if you don’t want to use the CLIs (tabcmd) to download massive data sources (>1GB)? What about users who, through no fault of their own, just click ‘download’ from the GUI? Do you, as Server admins, know the impact this has on the Server? Hint: you should and it’s bad.

Solution

Have users drop the name of their desired content in a shared file (or dedicated Slack channel) and then have daily backups done without using tabcmd or selecting ‘download’ from the GUI. Bonus: ship to AWS S3 and recover that space on your machine! Bigger bonus: logging.

Here’s what you’re going to do at a very high level:

  • Write super SQL that can dynamically get all the info you need (twb/twbx/tds/tdsx)
  • Use psql and the lo_export function to get this ^
  • This ^ won’t get you the TDE (if there is one) so you need to find it on the filesystem
  • Use the ‘extract’ table and get the UUID for where this ^ is stored in the filesystem
  • Parse the XML to update the location of the TDE (Soapbox: for those of you who think it’s ‘hacking’ XML, please make sure you RTFM).
  • Zip it up and send to AWS S3 and get it off your Server machine

Is that a lot of steps? Maybe. But this whole process is automated. Do a little work up front and you save yourself a lot of time down the line, not to mention a lot of space on your machine. Plus, your Server infra keeps humming along without the added load of multiple versions of your content. You also don’t need to worry about (1) versioning and (2) installing tabcmd.

ts-revision-history
Using AWS with Tableau

Here’s a sample of SQL you should write to scale to whatever content you’d need to backup and version.

select
  ds.id as "id"
, ds.luid as "luid"
, ds.site_id as "site_id"
, s.name as "site_name"
, s.luid as "site_luid"
, case when ds.data_engine_extracts = TRUE THEN lower(ds.repository_url)||'.tdsx' ELSE lower(ds.repository_url)||'.tds' end as "export_name"
, ds.data_engine_extracts as "hasExtract"
/*, ds.repository_data_id
, ds.repository_extract_data_id*/
, ed.descriptor as "tde_path"
, rd.content as "OID"
from datasources ds
   left join sites s on s.id = ds.site_id
    left join extracts ed on ed.datasource_id = ds.id
     left join repository_data rd on (ds.repository_data_id = rd.id OR ds.repository_extract_data_id = rd.id)

A.I.M. for Success with Logentries and Tableau

#Data16 may be over and the Server Admin session may have ended but don’t let the fun stop there. Continuing with the recommendation and urgency of making sure you monitor your Tableau/Analytics infrastructure, Logentries and I have teamed up on a Whitepaper regarding all things Alerting, Integrating and Monitoring.

You’ll find a very through analysis of the *why* it’s important to have a strategy in place as well as tips/tricks and recommendations for further reading. What’s more, you’ll find out how easy it is to get a variety of log data back into Tableau for deeper analysis.

So, get the Whitepaper and spend Thanksgiving implementing it. Just kidding. Take a break for Thanksgiving and then do this 🙂

Best,

Mike

 

Hello Analytics Engineering

I’ve talked a lot about Analytics and how it must be re-imagined for today’s often frantic pace of innovation in both technology and theory. What’s typically missing is the other side of the Analytics coin: Engineering. Most people tend to forget that before one can either explore or view some sort of analytics, there is a lot of movement that must go into preparing that data. This doesn’t even include ‘Self Service’ Analytics which is another story in and of itself!

You often hear: “Well, I just want all the data, I don’t care how hard it is.” Which translates to: “I don’t know what I want but tons of potentially useless data might get me an answer.”

Enter Analytics Engineers.

The data world is modular and in constant flux. One must be able to adapt, move data and present a tool in the most efficient and scalable way possible.

Enter Analytics Ops.

In order to do that, there has to be a ‘glue’ that can hold all the pieces of the Analytics/Data world together. That glue is #PowerShell and I’m thrilled to say that I’ve been selected as a speaker at the Global Devops #PowerShell 2017 summit. Twice!

I’ll be speaking about a different use for PowerShell and one that I’m pretty excited to share with the world. Basically, PowerShell and Data go very well together.

So, here are the two sessions I’ll be speaking at:

Session 1

Time:

Tuesday at 3pm PST:

Title:

Operation Float the Bloat: Use PowerShell for Tableau Server QA and Alerting

Abstract:

Business Intelligence and, generally speaking, the data landscape is a mixture of moving parts, sometimes so many that it’s hard to keep track of what process does what. An Enterprise BI platform is just that, a platform. Within that, we’re dealing with data from APIs, databases (relational and non-relational), text files and many more data variables. Missing from the Analytics infrastructure, however, is a proper log analytics and QA strategy. How do you know your platform is performing as it should be? How do you know data is secure? How do you streamline analytics so users are left with correct and fast data? How do you ensure users are publishing quality content at scale? In this session, we’ll show you how to do all that and more with Tableau Server and PowerShell by focusing on three pillars: Alert, Integrate and Monitor. We’ll use PowerShell and custom functions to ‘Garbage Collect’ old content and archive on Amazon S3, we’ll leverage log files from both the BI Platform and our servers (Windows and Linux) to monitor and maintain the condition and health of the analytics infrastructure. We’ll use PowerShell to easily convert the analytics data (worksheets and views) into a medium upon which we can change anything. We’ll also use PowerShell to dynamically create content in Tableau based on a configuration file. Oh, and, all of this is automated because PowerShell can make it so.

Session 2

Time:

Wednesday at 10am PST

Title:

Using PowerShell for Analytics Engineering (or why PowerShell is the glue for data, big or small).

Abstract:

While there are numerous and exceptional benefits to using PowerShell as an IT Pro and Developer, the hidden gem is its capability for Business Intelligence and Analytics Engineering. In simple terms, it is the lynch-pin of data and analytics. In this session, I’ll demonstrate how PowerShell is used to load, query and aggregate data for Business Intelligence platforms, specifically Tableau Server. What’s more, we’ll automate everything from AWS EC2 instance provisioning, report generation, report delivery, and Log Analytics. Want integration too?! We’ll show you how to reach into pretty much anything via APIs using tools like Chocolatey, cURL, WMI , Git, and Remoting. Oh, one more thing. We’ll use PowerShell classes along with scheduled jobs to make platform administration simple and stable. In the end, we’ll have modules, functions and custom scripts. Adding PowerShell to your data toolbox will provide enormous benefits, not the least of which is adaptability in the rapidly changing data landscape.

 

 

Hope to see you there!

-Mike

Automatically Choose the Max Filter Date in your Tableau Filter

Here’s a quick tip for Tableau users out there who may have tried to get their individual date filter to automatically choose the max data while (here’s the catch) still allowing for the drop down with other dates in it. Sure, it’s possible to use the LOD calculations to get you a Boolean but that would be too easy. Sometimes, the people just want to max date selected in their filter automatically. Oh, and, this also works for dynamic parameters.

dynamic-filter-2
I really want the max date. Please.

Here are the steps:

  • SQL to query for max date in column (this is the field used in your filter)
  • Match the date format (eg: mm-d-yyyy or mm-dd-yyyy)
  • Update the workbook section with the filter
  • Save workbook
  • Re-publish

The little bit of code you’ll need to dynamically do this will look like:

[xml]$TsUpdateTWB= get-content "someWorkbook.twb"
$TsUpdateTWB.SelectNodes('//filter/*') | where {$_.level -like '*Status Date*'} | % {$_.member = "#$tsFilterMax#"}
$TsUpdateTWB.Save("someWorkbook.twb")

That’s it! Since PowerShell makes it so easy to deal with XML, all you need to do is find the filter name and update.

This method can work with: Excel files, databases and everything in between. As long as you can query it, you’re golden.

What I’ve done to make this totally automatic is to use the File Watcher class to trigger this to run as soon as someone either saves a file or drops the name of the extract in Slack.

Please let me know if you’d like a demo or just come the TC16 and I’ll be showing this (and more)!

Stop a Tableau Extract Refresh without stopping Server

Oh no! I kicked off this Tableau extract and I need to take it back!! How do I stop it?

As long as I’ve been using Tableau, the ‘run-away’ extract has always eluded me. Sure, there’s netstat and procexp that can sort of get you there in a pretty raw way. If it’s late at night and there’s just one job running, that’s pretty easy to find the port and process ID (netstat -ano in PowerShell 3+).

bye-bye-ts-2
Oops. We didn’t want to run that one.

The challenge is when you have multiple backgrounders (on multiple machines) and they’re all active; you then start asking: which one is it? Hopefully, you guess correctly and, voila, it’s gone.

Until now.

bye-bye-ts-1
The process id for the currently running extract

Here’s what we’ll do:

1.) grab the log from this directory: C:\ProgramData\Tableau\Tableau Server\data\tabsvc\vizqlserver\Logs and follow (or parse) this log: backgrounder*.txt

NOTE: If you have a Log Analytics strategy, you can easily follow this log file and leverage some of the vendor’s command line tools to make this process entirely automatic.

2.) Wrap your code in a PowerShell function which allows you to enter your TWB/TDS name.

Stop-TsExtract -ExtractName '<some name>'
bye-bye-ts-3
The result of stopping the process.

3.) The file in #1 is JSON so it’s super easy to parse and dig out the ‘PID’ which, believe it or not, corresponds to the process on your Tableau Server box. We’re going to look for this key/value pair:

k="ds-parser-connect-extract" AND v.caption=<name of your data source/workbook>

NOTE: You could also look for the session via “k=lock-session” (which you then have to correlate from the other backgrounder log file) but this next value gives you the ability to grab (and enter) the data source/workbook name.

4.) Now, if you’ve set up PowerShell remoting or SSH, you remote into your Tableau Server (via Invoke-Command) and enter (where the variable procID is the backgrounder process):

gps -Id $procID | kill -Verbose

 

BONUS

If you remember, I posted about refreshing your Tableau extracts via Slack here. Well, the next step (if you want to open this up outside the admin group) is the let users drop the name of a data source/workbook in a Slack channel. Those background jobs they enter will be stopped if they’re accidentally started.

Oh, and, I should mention: test, test, test.  This isn’t supported by Tableau, but people believe it should be:) Make your vote count.