“It is easier to ship recipes than cakes and biscuits.” -John Maynard Keynes As an Analytics Engineer, you may hear this a lot: ‘Wait, so you just make dashboards and reports, right?’ It’s common but… More
Thanks to all those attended and for those who had questions, please let me know if there’s anything I can help with.
Forget about perfection
Your data (information) is a set of free-flowing, dynamic instructions about how business (or whatever) is understood. It will never be perfect. In fact, you don’t want it to be; if it’s perfect (which, again, is impossible) there’s no room for improvement or self-reflection. What’s more, it will lack creative impulse: you don’t think freely if something is ‘perfect’ and done.
Data is fluid
The goal should be to bring the certification/governance process *to* the data. If you must wait, collect, meet, agree and on and on, there is a critical piece missing: data should be certified from what it produces (or its many derivations). How does this work? How can you certify a csv file? Simple: Alert, Integrate and Monitor your Analytics Infrastructure.
Essentially, if nothing is created from the data, why would there be a need to certify it? Once something is created, then you relentlessly certify, in flight, what is being produced. It’s a sort of fact-checking, data-alerting mechanism that’s completely possible with the right framework. Which leads to the next point…
Collect data about patterns of usage
If you’re not analyzing usage patterns, you’re missing valuable data. With all analytics, there are reasons for why (1) a specific set of data is selected and (2) what the user is attempting to do with the data. You can easily keep this metadata in AWS S3 (with a good lifecycle policy) or store for potential later use somewhere else. The point is that if you aren’t understanding *why* then you are only seeing one side of the coin.
Keep everything and then figure out what to do with it and then how to certify it.
Leverage the cloud
Don’t be constrained or afraid to combine pieces of cloud technologies to serve the Analytics structure.
Become durable / resilient
Even though there are very high monthly uptime percentages, just be prepared for something to break. If you do that, you’ll have even *more* creative freedom (crazy, huh?).
Choose to: (1) scale laterally or (2) scale vertically
This is all about re-framing the question around Projects vs Sites.
Why you have Sites over Projects or Projects over Sites? And that can’t be the only choice, right? (Hint: it’s not the only choice)
I’ve seen benefits to both but the extra work involved with Sites make scaling laterally (Sites) much more difficult than vertically (Projects), not to mention the challenges of stepping into the compliance realm.
Remove all the pieces from your base install that can be done elsewhere (eg: collect the ‘garbage’ but store on AWS S3 with a good lifecycle policy). That way, your Analytics infra is light and fast.
I challenge you to think *bigger* with Tableau. How can you provide more fluid access to insight than anything else?
I’m calling it: 2017 will not confuse Analytics with Reporting
We’ve got too much technology, tooling, and components to mix the 2 realms (hint: they’ve never been related…the ‘self-service’ myth hasn’t really separated them quite yet ).
Analytics has depth and is fluid. Reporting is rigid and superficial.
Here is a small example of what I mean. Your #Fitbit is more than a report. Think about that and shake your, er, data-maker 🙂
Look for more on this and other tech bits this year.
Happy New Year!
Ever created a wonderful Tableau dashboard with the added ‘Export to CSV’ functionality? We all have. Click the super-sleek Excel icon and, viola, the download begins. Send the file, walk away and think: ‘my, was that cool.’
But wait. You get an email complaining about column order. For some reason, the columns you’ve added, perfectly, are all messed up. In fact, some would say they’re in alphabetical order. What the?!
Anyway, here’s an easy PowerShell function that will fix that and, send the email with the columns in the correct order.
There are plenty of ways to make a good backup of your analytics content and the options available on Tableau Server are numerous. But, and here’s the better question: are they efficient and redundant enough?
Yes, current Server versions allow for n number of days of backup for content but this slowly increases the size of your backup and storage (and puts too many eggs in one basket). Plus, there’s no effective way to turn this option off if you have multiple sites (at least that I’m aware of). What’s more, you can get away with a great daily backup strategy and subsequent (automatic) restore to your development machine.
To add to the complexity, what if you don’t want to use the CLIs (tabcmd) to download massive data sources (>1GB)? What about users who, through no fault of their own, just click ‘download’ from the GUI? Do you, as Server admins, know the impact this has on the Server? Hint: you should and it’s bad.
Have users drop the name of their desired content in a shared file (or dedicated Slack channel) and then have daily backups done without using tabcmd or selecting ‘download’ from the GUI. Bonus: ship to AWS S3 and recover that space on your machine! Bigger bonus: logging.
Here’s what you’re going to do at a very high level:
- Write super SQL that can dynamically get all the info you need (twb/twbx/tds/tdsx)
- Use psql and the lo_export function to get this ^
- This ^ won’t get you the TDE (if there is one) so you need to find it on the filesystem
- Use the ‘extract’ table and get the UUID for where this ^ is stored in the filesystem
- Parse the XML to update the location of the TDE (Soapbox: for those of you who think it’s ‘hacking’ XML, please make sure you RTFM).
- Zip it up and send to AWS S3 and get it off your Server machine
Is that a lot of steps? Maybe. But this whole process is automated. Do a little work up front and you save yourself a lot of time down the line, not to mention a lot of space on your machine. Plus, your Server infra keeps humming along without the added load of multiple versions of your content. You also don’t need to worry about (1) versioning and (2) installing tabcmd.
Here’s a sample of SQL you should write to scale to whatever content you’d need to backup and version.
select ds.id as "id" , ds.luid as "luid" , ds.site_id as "site_id" , s.name as "site_name" , s.luid as "site_luid" , case when ds.data_engine_extracts = TRUE THEN lower(ds.repository_url)||'.tdsx' ELSE lower(ds.repository_url)||'.tds' end as "export_name" , ds.data_engine_extracts as "hasExtract" /*, ds.repository_data_id , ds.repository_extract_data_id*/ , ed.descriptor as "tde_path" , rd.content as "OID" from datasources ds left join sites s on s.id = ds.site_id left join extracts ed on ed.datasource_id = ds.id left join repository_data rd on (ds.repository_data_id = rd.id OR ds.repository_extract_data_id = rd.id)
#Data16 may be over and the Server Admin session may have ended but don’t let the fun stop there. Continuing with the recommendation and urgency of making sure you monitor your Tableau/Analytics infrastructure, Logentries and I have teamed up on a Whitepaper regarding all things Alerting, Integrating and Monitoring.
You’ll find a very through analysis of the *why* it’s important to have a strategy in place as well as tips/tricks and recommendations for further reading. What’s more, you’ll find out how easy it is to get a variety of log data back into Tableau for deeper analysis.
So, get the Whitepaper and spend Thanksgiving implementing it. Just kidding. Take a break for Thanksgiving and then do this 🙂
Thanks to all who attended the Server Admin Meetup. For those that could not attend, I’ve attached the slides from the meeting.
If there are any questions, please don’t hesitate to let me know.
I’ve talked a lot about Analytics and how it must be re-imagined for today’s often frantic pace of innovation in both technology and theory. What’s typically missing is the other side of the Analytics coin: Engineering. Most people tend to forget that before one can either explore or view some sort of analytics, there is a lot of movement that must go into preparing that data. This doesn’t even include ‘Self Service’ Analytics which is another story in and of itself!
You often hear: “Well, I just want all the data, I don’t care how hard it is.” Which translates to: “I don’t know what I want but tons of potentially useless data might get me an answer.”
Enter Analytics Engineers.
The data world is modular and in constant flux. One must be able to adapt, move data and present a tool in the most efficient and scalable way possible.
Enter Analytics Ops.
In order to do that, there has to be a ‘glue’ that can hold all the pieces of the Analytics/Data world together. That glue is #PowerShell and I’m thrilled to say that I’ve been selected as a speaker at the Global Devops #PowerShell 2017 summit. Twice!
I’ll be speaking about a different use for PowerShell and one that I’m pretty excited to share with the world. Basically, PowerShell and Data go very well together.
So, here are the two sessions I’ll be speaking at:
Tuesday at 3pm PST:
Operation Float the Bloat: Use PowerShell for Tableau Server QA and Alerting
Business Intelligence and, generally speaking, the data landscape is a mixture of moving parts, sometimes so many that it’s hard to keep track of what process does what. An Enterprise BI platform is just that, a platform. Within that, we’re dealing with data from APIs, databases (relational and non-relational), text files and many more data variables. Missing from the Analytics infrastructure, however, is a proper log analytics and QA strategy. How do you know your platform is performing as it should be? How do you know data is secure? How do you streamline analytics so users are left with correct and fast data? How do you ensure users are publishing quality content at scale? In this session, we’ll show you how to do all that and more with Tableau Server and PowerShell by focusing on three pillars: Alert, Integrate and Monitor. We’ll use PowerShell and custom functions to ‘Garbage Collect’ old content and archive on Amazon S3, we’ll leverage log files from both the BI Platform and our servers (Windows and Linux) to monitor and maintain the condition and health of the analytics infrastructure. We’ll use PowerShell to easily convert the analytics data (worksheets and views) into a medium upon which we can change anything. We’ll also use PowerShell to dynamically create content in Tableau based on a configuration file. Oh, and, all of this is automated because PowerShell can make it so.
Wednesday at 10am PST
Using PowerShell for Analytics Engineering (or why PowerShell is the glue for data, big or small).
While there are numerous and exceptional benefits to using PowerShell as an IT Pro and Developer, the hidden gem is its capability for Business Intelligence and Analytics Engineering. In simple terms, it is the lynch-pin of data and analytics. In this session, I’ll demonstrate how PowerShell is used to load, query and aggregate data for Business Intelligence platforms, specifically Tableau Server. What’s more, we’ll automate everything from AWS EC2 instance provisioning, report generation, report delivery, and Log Analytics. Want integration too?! We’ll show you how to reach into pretty much anything via APIs using tools like Chocolatey, cURL, WMI , Git, and Remoting. Oh, one more thing. We’ll use PowerShell classes along with scheduled jobs to make platform administration simple and stable. In the end, we’ll have modules, functions and custom scripts. Adding PowerShell to your data toolbox will provide enormous benefits, not the least of which is adaptability in the rapidly changing data landscape.
Hope to see you there!
The Tableau Conference is near! I’m sure most of you are as excited about it as I am. This year promises to, once again, deliver on Tableau’s unique ability to provide a fun and functional conference.
This year, there will be a strong focus on some great IT sessions, meetups and more! So, visit this link and check out what’s going to happen (I’ll be there). If you’ve got questions, please let me know.
See you in Austin!