How clean is your data?

How clean is your data?

The age of big data is essentially the age of clutter.  Now that companies are sensing and measuring every step in their supply chains, there are huge troves of information out there which have to be organized, prioritized, and put into meaningful contexts in order to be useful.  This is how big data platforms make their money – by making sense of the data flows which characterize and influence business decisions.

Yet what we are quickly learning is that big data is about more than just the process of organizing and analyzing information. There is another, more fundamental issue which lies at the heart of big data’s effectiveness and value – one which companies frequently ignore.  That issue is data cleanliness, or the idea that consistent and easily processed stores of information are a necessary part of making big data useful.

More and more, companies are coming around to the fact that the information they get from analysis is only as good as the data they put in.  When the inputs of big data are flawed, the data outputs are necessarily flawed as well.  “Garbage in, garbage out” means that only cleansed information will become truly useful at the analysis stage.  Putting it in hard terms, all of the money that companies spend on organizing, processing, and analyzing big data can be for naught if the data itself is flawed.

In the constant churn which characterizes most businesses, the rush to gather and analyze data means that there can be little time (or little inclination) to make sure that the information going into the system is clear, consistent, and comprehensive.  This is the “good enough” school of big data, where analyzing the information on hand is seen as the best that can be achieved within a limited timeframe.

The flaw in this approach is that the very idea of big data is derived from its comprehensiveness.  When corners are cut on which data is included, or when some data is purposefully put to the side because it is difficult to process, the value of the analysis is necessarily diminished.  If the most critical data points happen to be in the data sets which aren’t analyzed, then the entire investment in big data is likely to fail.

For other firms, the value of clean data is diminished by the perceived costs of bringing that information up to standard.  “We’d like to look at every part of the business,” they say.  “But if we spent all the time and energy necessary to bring every piece of data up to code, it would actually take up more resources than we budgeted for.”  There is always going to be a cost/benefit analysis for big data – it will always have to produce more value than it consumes in order to be worthwhile.

Yet the fact that some data needs to be “cleaned up” may offer some valuable insights in and of itself.  Businesses who look into the root causes of flawed data may find that there are strong reasons to bring that data into compliance, perhaps by regularizing processes and minimizing exceptions.  An investment into cleaning up data is also an investment in cleaning up the processes which produce the data in the first place.

In big data as in many other things, cleanliness is next to godliness.  Is your company struggling to capture value from unclean data?  We can help.