Today we’re going to talk about data and how expectations around data usage have changed. It’s a well-known fact that the time-to-value expectations (which not long ago were considered unrealistic) became the standard.
Let me start with a quote from one of my favorite cult movies – The Princess Bride. No, I will not go with Inigo Montoya’s “you killed my father, prepare to die” or any of the other unforgettable ones. I will quote Miracle Max, the miracle maker, who said: “Don’t rush me, sonny. You rush a miracle man; you get rotten miracles. You got money?”
This was the motto behind a lot of the data analysis work done over the years regarding extracting value from data. The idea is that data analysts and data scientists are working their ‘magic’ or ‘miracles’ and that it takes time. And yes, it also costs a lot of money – whether we’re talking about setting up hefty data warehouses or hiring data scientists and analysts.
A lot has changed in the world of data analysis since those ‘miracle’ days.
Data usage evolved rapidly in the past few years. The approach transformed from merely understanding past events into predicting what will happen.
This was enhanced by several processes, among which are:
- Billions of IoT devices sending data from their sensors.
- Massive improvements in big data technologies, especially elastic cloud-based big data technologies.
- Evolution of Data Science tools and libraries like SciPy and TensorFlow, making machine learning much simpler to implement.
- A wealth of BI tools, where data can be analyzed in a speedy way and with less code-writing time and queries.
This means that where once there was a strong DBA team organizing the databases and data warehouses and enabling access to them – today, there are more flexible data structures with:
- Data lakes to which a large variety of the data is poured
- Large data warehouses giving great analytic power.
In today’s world of data warehouses like Redshift, BigQuery, and Snowflake, organizations get a lot of elasticity and can rapidly go from zero to a fully-fledged data warehouse pretty fast.
So the miracle women and men of data analytics and data science can now work miracles much faster.
And once organizations realize the potential of utilizing more and more data, they get ‘hooked.’ I see many companies simply throwing a plethora of data into their lakes and warehouses, enabling more teams within the organizations to access them.
And that’s only natural. As organizations realize that they sold more products because of optimization done for marketing and sales purposes, they want to implement more data-driven decisions in other areas, such as customer success, operations, purchasing, HR, and so on.
Time to Value, Time to Wow!
Everyone is riding the same wave of data-driven value, even your competitors.
It is crucial to shorten the excess time you spend on activities that are not actual data analytics and act upon the data.
Furthermore, since we’re all consenting adults deriving value from our data, the more time you actually spend on it instead of being distracted by other factors, the more chances you have to get not just value – but a wow.
And by wow, I mean an unexpected, expectation-exceeding value.
Here’s an example for a wow value:
Data scientists at a certain game studio discover that not only can they correlate between specific features and the probability that users will purchase in-game cosmetics, but they also find ways to match distinct clusters of users with particular classes or colors of in-game cosmetics.
You can imagine just how easy they can optimize their conversions…
Now that your organization has massive amounts of data sitting in your data repositories, and you have the right people and tools to turn this data into value, some of the things that can slow you down are security, compliance, and privacy.
Since the data probably contains sensitive information, there are inherent risks involved, such as data leaks and different compliance frameworks which you need to align with and report according to. Plus, you need to be aware of the PII (Personal Identifiable Information) data: Where it is, who has access to it, and who actually accesses it.
For example, let’s say that our business is a massive multiplayer game. We’re getting a lot of data from our game servers (as well as from other sources, such as enrichment services, web analytics, etc.). We’d like to get a set of features that will predict which players will spend the most money on premium cosmetic items to target them with a discount coupon.
We’d like to spend most of our efforts as a business on analyzing the data and creating the prediction algorithms, thus minimizing the time spent on other things. We’d like to sort out things like data access, security, and compliance quickly and simply, and—should we need to introduce software changes to our MMO—we don’t want them to take the edge out of our operation by being too slow.
So, for example, if you’re using a Snowflake warehouse for the analysis of the data pulled from the data lake and from other sources, you want to make sure you follow Snowflake’s Security guidelines and immediately identify sensitive data which is being retrieved as part of this project, and that you can build data access audit reports for different compliance regulations without disrupting the ‘Wow creation.’
To make use of the data we analyzed, we now need to make adjustments to our software. There need to be quick and agile development cycles. For such rapid iterations, especially in large codebases, it would be wise to make sure you reduce the build time so you can take advantage of the data analytics. This can be done by distributing the build cycles across your compute power.
Software Deployment Speed Is of the Essence
It is not just security you need to watch out for as collateral damage to all that data crunching.
Most companies cannot afford to be dealing with slow software processes that prevent them from reaching data on time.
Quick and agile development cycles are necessary, but with all that data (which means a large codebase), compilation time can be long, making that time-to-wow longer. And it’s just time wasted waiting around instead of making good use of that data. Luckily, technology assists in that aspect as well. We use distributed processing technology, which harnesses the power of other computers’ CPUs to reduce build time.
I tell you if talking about magic – that’s another trick up our sleeves that, honestly, whoever deals with data analysis should at least be aware of.
Once you can do true agile data-driven value creation and match it with fast software deployments, you can supply your customers with the value they deserve – the wow!