Eliminating Uncertainty through Clean Data
“Half the money I spend on advertising is wasted; the trouble is I don’t know which half,” John Wanamaker.
“Half the money I spend on advertising is wasted; the trouble is I don’t know which half,” John Wanamaker.
Uncertainty haunts marketers. A 2016 report by Accenture indicates that a majority of CMOs think that a large portion of their marketing spend fails to produce results. This lack of clarity has existed since advertising’s original geniuses forged best practices and extends well into the big data era.
The confluence of big data in many ways creates a new problem for marketing executives, paralysis. There is so much data coming at a marketing enterprise now, it becomes difficult to:
1) Delineate which data points matter
2) Integrate the data well so the information can interact to build complete customer life cycles
Even when a developer is deployed to connect the proverbial dots, interpretation becomes the devil. Every single human being brings their biases and prejudices to the table. Interpreting which data matters, why, and what it means becomes even cloudier.
Whole data sets are ignored based on sheer volume. Critical points are passed over. Obvious conclusions are dismissed. Personal beliefs drive decisions in the face of data that refutes them.
These types of biases confound marketing departments and agencies everyday. AI can eliminate much of this uncertainty and add clarity to the argument. However, AI needs clean data, and that returns us to the bid data mess.
The Data Issues Marketers Face
When a marketing department begins amassing data, it often becomes a big spaghetti incident with disparate data sources, different category names for the same data types, and even competing customer relationship management (CRM) systems to serve as the master database. There are no uniform standards for data that all businesses adhere to.
The resulting big data mess confounds enterprises, small and large. When data quality is low, the resulting analysis or machine learning output suffers, too. As data scientists like to say, “Garbage in, garbage out.”
Just collecting an entire enterprises data sources together in one master application can challenge enterprises, forcing Herculean projects that may take years. Application programming interfaces (APIs) need to be mapped; data integrity should be ensured; duplicate data must be eliminated; labels need consolidation across multiple source points, including the elimination of incorrect labels; missing data to be identified, etcetera, etcetera.
This is true even of smaller enterprises. Consider a mom and pop ice cream shop chain on the seashore. They may have a customer database of transactional information, including email addresses and/or phone numbers entered at points of sale, three social network presences (Facebook, Instagram, and Tinder), an email database on MailChimp that corresponds with their newsletter signup form, and a complimentary database of seasonal renters provided by their local chamber of commerce.
The mom and pop shop does not possess internal development resources and runs on a tight budget due to the seasonal nature of their business. They have no idea where to begin mining their data.
Now imagine an international hotel chain with dozens of brands, thousands of properties spanning the globes and hundreds of media outlets that they promote their various wares from. Worse, each brand has its own series of CRM, marketing automation, social software and content management platforms.
They each have their own advertising budgets and their own accounts on these media platforms. Some hotel property owners are franchisees and function independently. The exponential increase in data spaghetti hitting the wall is inconceivable to the human mind.
The big data mess hurts my head. I don’t think that’s a statement limited to creatives, either.
Consider that a data scientist spends 79% of her/his time assembling and preparing data for analysis, according to Crowdflower. And 57% of say the worst part of their job is cleaning the data.
But as a marketer you may still resist the idea of becoming literate about data science, at least on a conceptual level. Make no bones about it, marketers need machine learning to resolve the big data mess.
Welcome to the Machine is not a choice for any brand seeking to extrapolate its data with a decent accuracy rate. Fight AI and risk losing your job to someone more willing to embrace workable solutions to a vexing problem.
Six Data Problems to Consider
Understanding the types of problems a data scientist seeks to eliminate before analyzing data helps. The scope and scale of data assembly and preparation cannot be underestimated. Further, preparing data cannot be easily executed using a one touch AI bot with a master data assembly script. Perhaps in the future, data scientists will create autonomous AI to alleviate some of their workload.
For now here are some of the challenges facing marketers and their data scientists:
Incorrect data — If you are in the people business — e.g. human buyers — then you know your database is always out of date. People change jobs, move, delete email addresses, change phone vendors, and so on. Also known as data degradation, a corporate database usually decays by 2–4% on any given month. This extrapolates out to between 24–48% per year, with most businesses falling on the lower side of that range.
Missing data — You might think simply interconnecting data through APIs might be enough to aggregate information. However, many data sets have incomplete data or worse the dataset might have whole fields missing. If a correlation exists between the missing dataset and the problem you are trying to solve, then the results of your analysis should be considered untrustworthy.
Datawatch recommends checking your data sources closely to ensure reliability. If you are using an application to pull data, make sure it works reliability. Remember applications are made by humans, which makes them inherently fallible.
Insane Amounts of Data — Of course, there is the classic big data problem: Too much data. Just because your organization hired a data science team, doesn’t mean this problem goes away in a flash. More is not better. Subject matter experts — people who understand your business objectives and how that data relates to them — must work with the data management team to vet and select the correct information sources.
Eventually, the data science team can put together a data governance model to assist in collecting data. Yes, if the company chooses to it can assemble a machine learning tool to help collect and use data in the midst of incredible amounts of source information.
4) Poor Data Quality — A data analysis is only as good as the data used to solve a problem. A machine learning algorithm will only produce incorrect outcomes with poor data quality. Errors, typos, duplicates, and incorrect fields are just some of the examples that reduce the quality of a database. To overcome quality issues, organizations have to commit to treating data like a critical valued asset.
5) Data Accessibility — If data is collected at the source, then there maybe in a massive challenge getting that data to a place where it can be analyzed. Consider how much data a deep learning self driving car can generate. How will it be transmitted to a central data repository with a 4G wireless connection? Getting access to critical data, particularly data at the source, remains a huge issue for problem solvers.
6) Privacy — Marketers know the dangers of violating privacy, yet the pressures of success and the demands of the executive team can cause some questionable decisions, as discussed in the chapter on ethics. In other instances, data may be acquired through legal means, however the data subjects may not be aware of how you are using that data. Though they opted into a vague permission they did not do so with your brand. Using the information may be deemed a violation of their rights, or at a minimum their permission.
These are just the data challenges marketing teams, enterprises, and data scientists faces BEFORE beginning the actual work of trying to solve problems with data. A corporation’s first real endeavors working through the big data mess will encourage it to create data governance and to support stronger data curation practices across all industries.