Preparing data for analytics
Adam Bakhtiar - Consultant
Data processing and analytics have become a core leverage for an organization to thrive in the modern business environment. They are changing the basis of competition. Big organizations are using their capabilities not only to improve their core operations but to launch entirely new business models. The network effects of digital platforms are creating a winner-take-most dynamic in some markets. ( McKinsey Global Institute, 2016).
Data processing and analytics potentially make or break entities. Many organizations in the market search for the most cutting-edge data processing and analytics tools, willing to spend billions of dollars annually to apply the best technology money can buy. However, despite its hype, many organizations are failing to implement effective data analytics. A PWC study surveyed 1,800 senior business leaders in North America and Europe at mid-sized companies with more than 250 employees and enterprise-level organizations with over 2,500 employees. And the results were surprising, only a small percentage of companies reported effective data management practices. (PWC, 2015). According to Gartner’s analyst, Nick Heudecker (in 2017) implementation failure was around 85% and the cause of this was the users (Asay, 2017).
In my experience the rocks that most Analytics Projects founder on is dirty data. Many organizations underestimate the importance of data management, up until the point when they need to implement analytics, then they realize that majority of their data is unusable. It’s a really hard conversation to have with your CEO that despite the massive amount of money he’s invested in your CRM system and ERP system the data is redundant, inconsistent, duplicated all over the place, there is a lack of details and even missing data. If this goes unchecked then this leads to the classic GIGO scenario. Garbage In Garbage Out. What usually happens though is the project gets stuck in a development hell, where the developer builds a very complicated system to work around that dirty data. This becomes a nightmare to maintain. I often see this in spreadsheet systems.
If the data is dirty then you must clean it before you analyse it. I don’t care what tools you are using whether it’s spreadsheets or the latest whizzy product everyone is talking about. This process can be dull, take a long time and cost a lot of money. It doesn’t have the sex and sizzle of the demonstration that your CEO saw but it is the key ingredient to success. This process deters a lot of companies from transitioning into analytics and from reaping the benefits that better analytics will give the business. It also explains the 85% failure rate. Data cleanliness is directly correlated with your ability to deliver results that can meet your management’s expectations. Get this right and you will be in the 15%!
When I’m starting a project, I ask 5 questions of the data.
1. How valid is the data?
Does the CRM or ERP system fully validate the data being entered? Are their workarounds that the users have implemented to make the day-to-day running easier for them?
2. What is the integration like?
Does the ERP or CRM software produce different data types? It is important that we have these in a unified format to analyse. Does 9/11 mean November 9th or the September 11th? It can have a big impact.
3. How will I categorise my data?
Data in transactional systems is stored in a way that makes it easy to enter transactions. This can mean lots of keys with complex links and lots of checks. Analytics are much simpler. We just want to get things we count and things we value and analyse them by the who, the what, the where and the when. When I’ve done this, I find that Analytic tools work very easily.
4. Does my data have redundancy?
Once I’ve started to categorise my data I can now start to spot duplications in the data. Why do 2 departments in my University call the same course different things? Why does Acme Marketing appear in my CRM system yet in my ERP system there’s ACME Global Marketing? Are they even the same entity? Once we’ve started categorising then these anomolies start to stick out like a sore thumb.
5. Can I map my data?
In my experience mapping is the key to data governance and it is better to have these rules in one place rather than buried in code or spreadsheet formulas. When I look at a SKU where can I find out what product line its in, whether it’s a promotional product and what promotion it is in.
Follow these simple rules and you will find your effort in preparing your data will make your analytics projects fly. They will also ensure that your analytics projects match your business’s strategy and be a platform to manage the data in your source systems.