The use cases for big data analytics seem to be as endless as the data itself. And yet finding useful ways to present data analytics remains challenging. Most users don't have time to wade through oceans of data themselves. Being able to predict big data analytics use cases and provide consumable streams of data from the static is a role that's growing in importance and prevalence, according to Radhika Subramanian. Subramanian is the CEO and cofounder of Emcien, a company that builds data analysis tools. She spoke at the 2015 Big Data Tech Conference in April in Boston.
Your session at Big Data Tech Con was about the emerging position of "data chef." What are data chefs and how do they fit into the enterprise development team?
Companies are asking what is big data and how do I use it. The data chef comes up with big data analytics use cases and recipes. Taking us from the raw ingredients to an end result that can be easily consumed -- that is the job of the data chef. Data scientists are building a toolset and those tools will become standardized. We need folks with subject matter expertise, who understand what a particular type of data means and who understand how to use the tools. These folks will be able to create deliverables. They will live at the intersection of end user knowledge and data science. They don't have to know how to make new tools, they just have to be able to use the existing tools to make the data consumable.
People are jumping up and down about gaining business insight with big data. A whole bunch of numbers is not insight. We need results that make sense in a consumable way. We need to know what those numbers mean. We need to know how to make better decisions based on the numbers. And that's very context dependent. So the tools can't do it all themselves. We need people who know how to use the tools.
What are the tools, techniques, and recipes every data chef should be familiar with?
There are a lot of important tools. At the bottom is storage. It's like your pantry, stocked with an inventory of raw data -- the ingredients. A chef needs to keep ingredients fresh, and a data chef needs to keep raw data properly.
After that, the chef has an array of appliances and utensils for preparing and cooking the ingredients. The data chef has data analysis tools. These are the most important ones. This is where the bulk of the work gets done.
Then the food has to be plated. We don't expect a chef to just plop the chow on a plate any old way. It has to be made to look pretty. That's our visualization layer for the data chef. It's either putting together charts and graphs in a dashboard for direct consumption, or else packaging the data and formatting it properly for some other engine downstream.
Machine learning is another important component, along with graph analytics. These tools are about making connections between the data. About knowing what raw data connects with what other raw data and how. There are also a few other raw data tools, and if you know the whole stack, you can build a consumable end product.
What might that look like in a real world example?
Radhika SubramanianCEO and Cofounder, Encien
An example of preparing data for another engine downstream is Amazon's recommendation engine. It works on sales data and finds recommendations based on the user and other users whose purchase histories are similar to the current user. But it doesn't present that data to the customer directly. It stores it and works with it and then when the customer is interacting with the website, there are appropriate times when the website will refer back to the recommendation data and put up a pretty advertisement based the user's buying history.
Similar things are being done with [Internet of Things] in hospitals. All of the appliances and devices are connected to the Internet. The system is collecting and analyzing a lot of data, but it's not being pushed to a central dashboard. Instead, when a particular light begins flickering, the system generates an alert for the maintenance department. The maintenance department doesn't care if 99 out of a hundred light bulbs are functioning at 100%. They just want to know which ones, if any, need attention right now.
Either one of those previous examples is a data use case, or a recipe. It's the path the data takes to get to the end user. It's all about the last mile.
How do we go from custom-built artisan solutions to robust, repeatable solutions?
That's a good point. More and more, we're not going to want a few Ferraris here and there that are each custom-built. We want a Ford manufacturing plant that can pump out a car every five minutes. We don't have enough data scientists to go around making custom tools for every company. But we don't need that. The data scientists can engineer general use tools.
In the kitchen, we don't have too many different models of appliances. But we have thousands of chefs all doing different things with those appliances. You don't need to know how to build a computer to be able to do valuable work with it. It's the same thing with big data.
Every company has a lot of data and so a lot of ingredients. If they have a lot of chefs, there's a lot of big data analytics use cases they can discover.
The enterprise already has ERP systems, CRM, and so on and so forth, and all of these systems are constantly feeding back data. Now there is a need for the people with enough understanding about data in general, as well as in their separate fields, to figure out the use-cases for analyzing the data. The executives will be served up the critical data they need to make decisions. They won't have a blanket view of all one thousand customers. Instead, they'll have a look at the most critical customers and the ones that need attention for whatever business-specific reasons. We are definitely moving towards the data driven organization.
Learn about using the developer tool Chef with data
Check out more big data tools
Think about data privacy issues involved in big data
Read about Kirk Borne and data literacy.