We’ve talked before about big data and the characteristics that define it. Now, I want to dive deeper to talk about how businesses move from current state to successfully applying big data. Conveniently, we talk about the steps of big data in terms of three words that start with P: Past, Present, and Predictive.
Picture the process like a three-rung ladder. You have to start at the bottom and climb up but can only proceed to the next rung after the previous one has been successfully navigated.
Past Data
Most companies start collecting data from the day they’re founded. Whether or not they do anything with that data is a different story. But the data still exists. This is past, or historical data. Whether or not your old data is considered “big data” depends on how well it aligns with the 3 V’s that define big data.
If your data isn’t particularly valuable—or you’re not doing anything with the data you have—then you’re still on the ground level. However, if you do have past big data and you’re using it, then congratulations, you’re on the first rung of the ladder.
One of the most common places companies find past data is in legacy systems—software platforms that have been phased out. This legacy system data may or may not be valuable based on a number of factors:
- Recency – Does the data still reflect conditions similar to your current business model?
- Volume – How much data exists?
- Accuracy – How clean is the data? Was it collected and stored using best practices?
If your data is old, there isn’t much of it, and it’s not clean, then analyzing it won’t provide any value. However, when past big data meets the necessary requirements, it can bring forward powerful insights for your business.
Present Data
When you think about present data, it can be hard to distinguish from past data. If it’s all in the same system, when does it turn from present to past?
The best way to illustrate present data is an example.
Manufacturing Production
The best time to catch an error in production is before it happens. The second best is as it’s happening—aka, the present. When current data is measured in the manufacturing process (for example, product size, defect percentage, and other business-critical factors), you can identify and fix problems before they become costly problems.
For example, let’s take a manufacturing plant that creates widgets. Each widget should weigh between 9.75lbs and 10.25lbs. When the weight is in that range, the widgets most likely meet quality standards. However, if the weight falls outside of that range, the widgets are defective and, consequently, unsellable.
If the manufacturer has processes and tools in place to track the weight of widgets in real time, they can see right away if a costly pattern of defects starts to arise. They can set rules to halt production or initiate a human review if the data hits a certain threshold—for example, the widget weight falls outside of the acceptable range more than 1% of the time in any rolling 24-hour period.
The ability to catch potential issues quickly (and do something about them) is a tangible way to apply present data for cost savings.
Let’s think about this example on the 3-step ladder. Technically, the manufacturer could start tracking and set thresholds without past data. But the thresholds they choose and the decisions they make when data falls outside the threshold will be based on, what, intuition?
By getting past data in order and, in this example, learning 1) the proper weight range, 2) the “normal” or “acceptable” percentage of defects, and 3) the threshold percentage before action is taken, you can be more confident in your real-time decisions. Quality, clean past data—the first rung of the ladder—is vital for making rung two a success.
Predictive Data
Predictive data analysis, which is often painted under the artificial intelligence and machine learning umbrellas these days, uses past and present data to come up with predictions for the future.
Already, you can see that the first two rungs of the big data ladder are required for predictive analysis, the top rung. You need past data to build models, and you need current data to make those models work with any sense of certainty.
When we say predictive analysis, we don’t mean seeing into the future with some sort of crystal ball. Predictions are essentially estimates based on inputs and data models. They show different potential scenarios that have relative probabilities of happening. To improve predictions, we’re always on the lookout for ways to improve our past data and for better predictors like leading and lagging indicators.
The third rung of the ladder is the hardest to reach. Each rung requires more resources and more expertise. You may not have any dedicated data and analytics resources as you jump into your past data. But by the time you’re working through rung three, you may have a team of people dedicated to business intelligence.
It’s important, as you work your way up the ladder, to know your reality—start where you are. If you have messy past data and you try to jump to rung two, you’ll find yourself using data to make the wrong decisions. Moving up the ladder is a process, and your goal is to keep progressing upward. To do so, you have to put in the work to make it happen.
Are you interested in seeing what potential your past and present data could have? Reach out.