Whether you are working in a technical or business position, it is essential to make sense of the sheer volumes of data and to convey those findings to others in a meaningful way.
But the people who are entrusted with interpreting and reporting this information, including the data scientists themselves, are still only human, each subject to their own biases and prejudices. We compensate for these biases in our observations by using a combination of mathematical models, by knowing how people read and respond when new information is presented to them, and learning about what our biases are.
Combining big data, social sciences and statistics will solve big data problems and will become a necessity when it comes to making quick business decisions based on good quality data.
Building on Nick Smith’s recent blog on sensory data, one of the outcomes of big data will surely be the exponential growth of data that will be available to us. It will exist in a wide range of formats and data types, manifesting in anything from relational tables to video and audio formats. We are able to join our own internal and external data sets with other data, such as that from open data and social media, and find
But focusing on the right data is important. Neuroscience tells us that when the senses are bombarded with conflicting information, our ability to consolidate all of it becomes almost impossible.
In fact, when the brain is presented with massive amounts of data, it will tend to do one of two things. It will either ignore large chunks of it, especially if it does not confirm what we already know, or it will cherry pick the data that conforms to what it already knows. So despite your best efforts to collect , process, store, analyse and present everything you have, the newest and most valuable insights are slipping through the net.
An expensive and time-consuming loss. Not only that, these businesses are in danger of presenting false information. But research finds that when an individual with expertise is presented with new information, given enough time to think through the problem and process the data accordingly, can produce genuine new insights without losing valuable information. Often by providing business context beforehand to ask the right questions of the data, we can lower our mental payload and filter out the noise from the really important stuff.
Our natural abilities to make sense of complex systems and find patterns can both help and hinder us. For example, an over-reliance on correlation is a potential danger when trying to craft a strong narrative. A quick visit to Spurious Correlations will give you an idea of what I mean. At face value, a strong correlation between two seemingly related variables can imply a causal relationship. Sometimes this simply isn’t true.
Even the size of the data does not guarantee significance – if you flip a coin 300,000 times and reveal heads 50.2% of the time, the difference can be calculated as statistically significant. Our ability to see patterns in data can also be our pitfall when we see patterns in data when there aren’t any.
Above: one of my favourite examples of spurious correlations from www.tylervigen.com. People are naturally quite imaginative, so it is easy to think that as one eats more margarine, your likelihood of getting divorced will increase as you become fatter.
Another way of ensuring that information is not ignored is to present in an accurate and meaningful way. People have a preference for visualisations, but also latch on to information that is of personal significance to them. To do this, people can create narratives to describe patterns in the data, how it is changing, and why certain events are happening.
Neuroscience also tell us that people even have preferences for certain types of stories. We prefer stories with simpler explanations over complex ones, stories that conforms with information that we knew already, and we have a strong tendency to place too weight on personal experience and anecdotal evidence, on occasions pushing more factual explanations out of the spotlight.
Depending on the problem, there could be a number of explanations for a particular pattern in a data set. It could easily be the case that no pattern exists at all. It is the job of the data consultant and data scientist to guide you through this process and to explain the best practices for future data collection, designing filters for any incoming data and developing the narrative as time goes on.
Some solutions will require various degrees of certainty. For some companies, the power of big data is not knowing exactly why your customers are behaving in a certain way, just that they are behaving in that way. This knowledge might not have been available otherwise, and focusing too much on the mechanics of the observation might mean that you miss the boat for a potential campaign or sale.
Other big data solutions require a greater degree of consistency (fraud detection, ensuring a patient getting the correct treatment, or handling a customer’s money) and 99% certainty just might not be enough. Using big data analytics to accurately predict behaviour is more important to some and is perfectly possible with current big data technologies.
Organisations are starting to see the potential of neuroscience in realizing big data’s full potential.
A deeper understanding of human attitudes and behaviour will allow businesses not just to detect trends faster and predict outcomes, but also to allow them to grow and continue to gain valuable insights.
By combining social science theory with findings from the analysis, you can end up with a continuous process that ensures that incoming data is being attended to while also allowing for new insights, resulting in better action and decision-making.