Tableau Server Log Analytics: Easy Parser

I’ve mentioned before but it’s worth mentioning again: Log Analytics and Tableau Server is a wonderful thing. There’s a ton of helpful information in Tableau logs (and pretty much *all* logs) which, along with the PostgreSQL data, make for a very good data toolbox.

I’ve also mentioned Logentries a lot when digging through Tableau logs. There are many reasons I use the tool, but the one which makes it the most useful is: centralized log analysis. Essentially the workflow goes like this: Tableau –> Logs –> Logentries –> Tableau (and around and around). It’s a positive feedback loop of valuable data which can give you insight into things such as:

  • What workbooks take the longest to load and *where* they are geographically
  • What user downloads the most data, how much and how long does it take
  • Http 404s
  • Filters used by a user/workbook
  • Data sources used by a user/workbook
  • and more!

With Tableau, you’re either leveraging a Log Analytics strategy or you’re not. I cannot stress how vital it is for Tableau Server administrators to at least have some plan in place for when you are suddenly inundated with a ‘slow’ server and/or site.

That said, often times it’s easier to have a few functions and tools to make the ad-hoc or automated analysis easier. Here’s one: we’ll wrap the Logentries REST API in a PowerShell function. This will simply allow us to pull log data from Apache or VizQL based off of a simple parameter.

What’s returned is a neatly formatted csv which you can then import into Tableau, add to a database or simply do some quick research. For example, if you want to ensure excessive 404s are handled, you can simply use this function with a filter, parse, and lookup the offending IP address.  If necessary you’d add those IPs to a firewall rule.

More specifically, here’s an example of how you would use the function in PowerShell:

Get-TsLog -leAcctKeyVizQL 'your VizQL key' -leFilterVizQL 'k=end-update-sheet' -workpathVizQL "C:\users\$env:username\Documents" -apikey 'your Logentries API key'
vapor_rub_0
Here’s where your log data (parsed) can become a great means to improve performance and react before things happen

The added benefit of adding this type of aggregated data to your own Tableau data model and database is that it gives the admin some data for historical purposes, planning and learning.

So, here’s the module on the PowerShell gallery. Let me know if there are questions.

 

Slack your Tableau Extract – Part II

Ever wish you could drop the name of your Tableau workbook and/or data source into a Slack channel and have it automatically refresh? What if you’re developing and need to get a quick refresh completed? What if you don’t have tabcmd installed on your machine? What if you want to add step at the end of your pipeline that drops the name of the content into the Slack channel?

ts-refresh-extract-0
Example of name of Tableau content

I’ve talked about this before but what had to happen was the extract needed to exist in the ‘background_jobs’ table. Well, that won’t always happen as people will be doing this for the first time. So, we needed to expand it a bit to include *all* possibilities (workbook and data sources). Also, in this much improved version, we Slack back to the user and let them know their extract is completed.

ts-refresh-extract-2
Process for *each* extract (all dynamic) 

That’s the beauty of the ‘Tableau-Slack-Logentries‘ integration. When you have a decent amount of parts, the whole becomes a fascinating thing.

Here are the steps:

  • get the data from the Logentries webhook
  • Process the data for each extract
  • getting current slack users: don’t need to do this often (unless you want to)
  • getting valid list of workbooks and data sources
  • Processing list of extracts : rolling 24 hours
  • getting valid list of workbook / data source owners
  • create Slack content object: basically it must add up to a certain number to run (for example, if the person who dropped the name in the channel isn’t the owner, it won’t succeed).
  • Log it!

 

ts-refresh-extract-3
Tableau Server example of completed extract. 
ts-refresh-extract-1
Corresponding message back from Tableau Server

If anyone is interested in the code and / or a demo, please let me know and I’ll be happy to show it.

Automatically remove (and archive) extracts that fail more than ‘n’ times

Keep it clean

Every now and then, Tableau extracts aren’t refreshed properly. That, in and of itself, isn’t a bad thing. What we’re worried about are the extracts that continue to fail day after day. They consume resources and, most importantly, don’t give a true picture of the data.

ts-extract-fails-0
Continual extract failures

Here’s a simple script that queries postgres and grabs the extracts which have failed n times (you can choose your threshold). At a very high level, it does the following:

  • Get list of failed extracts (most recent fail date should be the current date)
  • Limit to those only above your desired threshold
  • Use REST API to pull extracts from Tableau Server
  • Do a diff on list and downloaded files and remove only those which are equal
  • Archive twb/twbx/tds/tdsx so users can refer to this later
  • Delete content from Tableau Server
ts-extract-fails-1
Here is how we dynamically switch from site to site *and* from content resource (data source or workbook) with the REST API

Taking it a step further

If you have a Log Analytics strategy in place, you can send the output of the script to a file and have the log agent follow it. This will give you and the rest of the Tableau admin team some insight to what is failing (other than what I’ve talked about before).

You can also integrate with Slack in order to notify the user/owner that his/her workbook will be remove until the failure is fixed.

Tableau & Log Analytics Webinar

We all know Tableau has changed the way people use data to make better decisions (business or otherwise). Thanks to Tableau’s robust set of admin capabilities, APIs and integration, users are now empowered to change the way traditional business intelligence software is managed.

Join Tableau, Logentries and Pluralsight on June 16th at 2pm EDT to learn how you can build a simple analytics stack for automating your Tableau alerting, integration and monitoring workflows.

See you there!

-Mike

Use the Logs: Tableau Server Log Analytics – Part 4

Week 4: Data Engine

For a review of what we’ve covered, make sure to check out the previous three weeks of log analytics for Tableau Server:

Week 1: Backgrounder

Week 2: Apache

Week 3: VizQL

Introduction

If you’ve read through all of the Log Analytics series, then you’re in pretty good shape when it comes to understanding the ‘voice’ of your Tableau Server. For our final week, we’re going to talk about the Tableau Server workhorse: Data Engine.  Yes, it might not be the most friendly of formats but with the right Log Analytics tool and strategy, you’ll finally be able to understand: how uploads progress, how long queries take to execute, how many ‘TEMP.#Tableau…’ tables there are and more! It’s really an interesting log set simply because it gives you a good idea of how well users are managing their extract’s design. You should really never see excessive query times as the Tableau Data Engine is pretty well put together. However, as all things go, sometimes stuff just doesn’t work.

 

ts-logs-de-1
The Data Engine EKG

If you environment is both extract heavy and view heavy, following this log (and the others mentioned above) will allow you to tune your environment with ease. You can, as we do, have over 3500 users with 100s of extracts on one machine.

Where’s the Log

From the Tableau Server admin guide:

The Tableau Server log directory is C:\ProgramData\Tableau\Tableau Server\data\tabsvc\logs if you installed Tableau Server on drive C, unless otherwise noted in the table below.

Data Engine, then, is in the folder dataengine.

What does it do for me?

Again, from the Tableau Server admin guide:

‘There will be a tdeserver log file for each day with information about data extracts and queries, and responses to VizQL server requests.’

But there’s so much more. Some things (but not all) we watch for with the Data Engine:

  • Upload size over time
  • Query Execution time > 10 seconds
  • ‘tdeserver: connection dead’ errors

For example, if you’re using a log analytics tool, you might be able to run a query like this:

ts-logs-de-4
graph of data engine activity

 

where(/UploadSetSize:  upload_guid=\d size=(?P<UploadSize>\d*)/) calculate(sum:UploadSize) timeslice(100)

ts-logs-de-5
text details of data engine activity

With the above query, we can get information the size of extracts being loaded. You can even trend this over to correlate with latency issues for users who may be experiencing ‘slowness’. And you do this by picking out the session IDs from the log and tying that back to the vizqlserver/logs directory of files (see how it all connects 🙂 ).

ts-logs-de-2
linking to another log via sessionid

Keep in mind, this is also how you’d want to leverage both alerting and tagging with your log analytics tool; if query time is consistently high and you’re seeing a lot of longer load times on dashboards, it may be time to investigate how users are designing their dashboards with their extracts.

ts-logs-de-0
query timeline

where(/time:(?P<Time>\d*)/) calculate(SUM:Time) timeslice(100)

As the above query will show, a consistent increase in execution time (or total time) may be something to investigate. Be aware, you can see the query being execution by searching for the ‘StatementPrepare’ portion of the log. Some of them can get pretty crazy.

ts-logs-de-3
Daily sessionid info, grouped by sess_guid

What should I be Alerted on?

Only those alerts which would require you to take action. Namely, these two:

  • ‘tdeserver: connection dead’ errors
  • Query Execution time > 60 seconds

So that’s it! Four weeks of Tableau Server Log Analytics and, believe it or not, there could be so much more! The four we’ve talking about cover a good portion of what you’d need to understand the basics of your environment.

And now that I’ve told you about consuming Tableau logs, why don’t you create your own?! Yes, in a series I’m currently working on, I’ll show you how to generate more detailed, user and security focused, logs related to Tableau Desktop (and Server).

 

Use the Logs: Tableau Server Log Analytics – Part 3

Week 3: VizQL

Intro

While the Apache/httpd log may be the most interesting of the Tableau Server log files, the VizQL logs are equally impressive simply because they are JSON formatted. The depth of information one can gather from these logs provides a very real-time look into the happenings of one’s Server environment.  You can find anything from the dimensions of a dashboard as it renders on a client machine to the amount of rows returned by a Tableau Server data source for each user. And, the most valuable part, well at least one of them, is the req part of the payload which links to the Apache logs.

ts-logs-vizql-0
Understanding total query time by user on Tableau Server
  • ts:
  • pid:
  • tid:
  • sev:
  • req:
  • sess:
  • site:
  • user:
  • k:
  • v:

The above list shows the payload for the log and, from that, you can confidently understand your Tableau Server. What’s more, in combination with a Log Analytics tool, you can easily query the log to get a more detailed sense of, for example, how long a query takes to execute for each user.

where(k=”end-query”) groupby(user) calculate(percentile(90):v.elapsed) sort(desc)

ts-logs-vizql-1

Where’s the Log

From the Tableau Server admin guide:

The Tableau Server log directory is C:\ProgramData\Tableau\Tableau Server\data\tabsvc\logs if you installed Tableau Server on drive C, unless otherwise noted in the table below.

VizQL, then, is in the folder: vizqlserver\logs. Note the difference here as we’re directly in the vizqlserver directory.

What does it do for me?

A lot.

ts-logs-vizql-4
Tableau Server Resource Manager data. Very valuable.

But there’s so much more. Some things (but not all) we watch for:

  • Resource Issues (CPU/Memory) by process (very important)
  • Workbook load time > 10 seconds
  • Average query time
  • Average query time by site/user/time of day
  • Query execution > 10/20/60 seconds (these alerts go directly to Slack)
  • Rows returned by user/site/time of day
  • Sev=FATAL

 For example, if you’re using a log analytics tool, you might be able to run a query like this:

where(k=”read-metadata” AND v.attributes.dbname=”<some data source>”) calculate(sum:v.elapsed)

ts-logs-vizql-6
Analyzing meta-data read time for a data source

With the above query, we can get information on how long it takes to read a data sources meta-data across time.

Keep in mind, this is also how you’d want to leverage both alerting and tagging with your log analytics tool; if query time is consistently high and a dashboard consistently takes a long time to load, then this is a learning opportunity for your users.

I’ve talked about many ways you can start to aggregate these metrics and distribute a daily or weekly summary to your users. If a workbook loads slowly, it’s time to re-train and re-focus the Tableau design and analytics. Most of the time, users aren’t even aware there is a performance issue!

ts-logs-vizql-2
Performance by Sheet on Tableau Server (probably the biggest indicator of poor performance)

 

What should I be Alerted on?

Only those alerts which would require you to take immediate action. Namely, these two:

  • SEV=Fatal
  • “end-update-sheet” > 60 seconds

Since this log is JSON formatted, it’s pretty much up to you as to what you need to look for (or be alerted on). I can say, however, there is plenty of valuable information in this log that provides the answer to the most common Tableau Server question: Why does my dashboard take so long to load?

 

phx-tug-1
Summarized version of a parsed log file. Giving hourly/daily performance data

 

Use the Logs: Tableau Server Log Analytics – Part 2

Week 2: Apache

Intro

I believe the Apache/httpd log is the most interesting of the Tableau Server log files because of the depth of information it provides. What’s more, with the Unique ID at the end of each entry, you can tie this to other, more verbose, logs (for that link, you’ll need to wait until next week when I talk about VizQL).

ts-logs-httpd-3
Alerts built off of the Apache log

Thankfully, the Apache log uses the Common Log Format and, with that, you have a standard way to parse this text file. Tableau, in a testament to its vision, has added a few bits to the end of this standard. The three most important are:

  • Content-Length
  • %D (this is the best part and it is, according to the docs, ‘The time taken to serve the request, in microseconds.’)
  • Unique_ID

If you want to see the configuration, look in the httpd.conf file and look for the bits that talk about ‘mod_log_config’.

ts-logs-httpd-4
Additions to the Apache CLF

Having the time to serve request gives us the ability to understand Tableau Server performance and, in one query, get the average time it takes to serve a request over the last {n} days:

where(/HTTP\/1\.1″ “-” \d{3} \d([0-9]|1[0-9]) “\d([0-9]|1[0-9])” (?P<TTS>\d*)/) calculate(average:TTS)

ts-logs-httpd-6
Daily performance summary

Where’s the Log

From the Tableau Server admin guide:

The Tableau Server log directory is C:\ProgramData\Tableau\Tableau Server\data\tabsvc\logs if you installed Tableau Server on drive C, unless otherwise noted in the table below.

Apache, then, is in the folder: httpd. You’ll want to follow the *.log files and (for this one), the error.log.

What does it do for me?

Again, according to the docs, ‘Apache logs. Look here for authentication entries. Each request in the Apache log will have a request ID associated with it. This request ID is used throughout the server logs and you can use it to associate log entries with a request.’

But there’s so much more. Some things (but not all) we watch for:

  • http status (200,404,500, etc)
  • IP
  • DateTime
  • Request
  • Request Type (GET/POST, etc)
  • Request ID (mentioned above)
  • Time to Serve request
  • Csv downloads
  • Login attempts
  • 404/500 (the bad ones)

 For example, if you’re using a log analytics tool, you might be able to run a query like this:

where(/HTTP\/1\.1″ “-” (?P<status>\d{3})/ AND status!=200) groupby(status) calculate(count) sort(desc)

ts-logs-httpd-1
http statuses over {n} days

With the above query (which we run on Logentries), we can get information on http status codes and, hopefully, all of them are good.

Keep in mind, this is also how you’d want to leverage both alerting and tagging with your log analytics tool; you might do this because if there are {n} requests over a threshold, say 404s, you want to know about it. Conversely, you’d also want to be made aware of inactivity (or anomalies) from the logs. For example, if it was normally pretty chatty, you want to know ASAP if it has gone silent.

If you’re really feeling up to it, you can leverage regular expression field extraction and make some custom groupings out of the text file.

In this example, let’s say you wanted to do a quick check on the 90th percentile of the time to serve (NOTE: this is in microseconds) amounts just by using a REST API to query your logs. Here’s what you send with the API call:

where(/HTTP\/1\.1″ “-” \d{3} \d([0-9]|1[0-9]) “\d([0-9]|1[0-9])” (?P<TTS>\d*)/) calculate(percentile(90):TTS) timeslice(60)

You can send these as alerts as they happen to users via email and/or Slack. Or you can make this as part of a performance summary (via a Tableau Dashboard) about your Tableau infrastructure. I mentioned a more robust solution a while back. You can read about it here. We automatically parse the Apache log for the 200s and then use an IP lookup tool to geo-locate possible performance issues with our Tableau infrastructure.

ts-geo-perf-0
Custom data model we have for Server performance

What should I be Alerted on?

Don’t forget: Too many wrong alerts, you miss the right alerts.

For the Apache logs, we alert on these bits:

  • Time to serve > 5 seconds
  • Excessive ‘Auto-Refresh’ events
  • Excessive CSV downloads
  • 404s > a rolling average
ts-logs-httpd-0
Same alerts/tags for an Apache log

As mentioned above, I believe the Apache log is the most useful log to gauge a lot of critical Tableau events, not the least of which is access and performance of views/workbooks/data sources.