How data has failed us

The rise of “uneducated” pundits, and the fall from grace of the “educated” elite.

2016 has been an interesting year when it comes to the rise of more precise data analysis, with infinitely further reach, which has failed two political proceses which seemingly defied all “logical” thought to be the triumphant underdogs, shaping opur year and future.

At the end of June, Britain made a monumental decision to leave the European Union with whom it has been a member since it’s inception as the EEC in 1973. This referendum which was the second of it’s kind since it joined was the first stage of the failure of data analysis providing meaningful speculation in political discourse.

This further backed up by the results just last week, where the candidate seen as the under dog, with his lack of experience and controversial tag lines made him win the race and become President-elect.

Just looking at the problem of Brexit we can see how what seemed to be Big Data was sometimes not really Big Data at all (missing the variety off the triple V identification). In the Brexit case, the information relied on traditional polling, betting, and social media. Due to cost and ease of access, traditional polling is often skewed towards population centers — in this case, the London area, which everyone knew would vote to remain in the EU. Betting does reflect a better cross section of the population, but we have no way of knowing who is placing the bets or how diverse that population may be, though there is evidence that the people who bet on political items like this are people who have greater financial resources and think they understand the outcome. Social media provides good statistics but is heavily dependent on the younger generation, who we already knew were heavily in favor of remaining, as opposed to the older generation, which was in favor of leaving. 

In this respect the data was not wrong. It was just tasked to answer the wrong question.

Then we come to the 2016 US Elections. The first election in quite a long time that has featured a winning candidate without political experience (even other candidates without political expereince had a wide breadth of miltiary experience).

The US election polls have always thought to have been quite a full proof way to show with some certainty the result of each candidate.

Pollsters spend a year plus canvasing the country to find the answer to a question the nation is asking, who will be made the President.

In retrospect of the result stasticians have come up with a few points that have made polling an ineffective way to show the actual result.

Social desirability bias

Trump claimed that some of his voters were too “shy” to admit publicly that they were voting for him. Maybe he was right. People fear it’s not socially acceptable to say they support Trump, so they lie or don’t talk to pollsters.

Underrepresentation of socio-economic groups

This again allows us to fall into the path of lack of variety in the data. Some pollsters have stated that they had under-sampled non-college-educated whites, a group that Trump appealed to,” relying on the belief that the nation’s changing demographics would dictate the election. Also the decline of landline telephones and the difficulty of retrieving mobile phone numbers has been idenitified as an issue that could lead to misrepresentation.

The number of polls have declined

Fewer polls means a greater chance of polling error, and there were simply fewer polls in 2016 than there were in 2012. Some battleground states had very few or not very reliable polls.

Ultimately the moral of the story is that, big data did not fail. The people driving the data collection failed, and this provides an important lesson for the future: you need not only statistically relevant data but also data that reflects the nature of the real world. In this case, the data collected didn’t reflect the strong division or deep anger of those most likely to vote “remain” and “Trump”. Just because the data is there does not mean it tells the right story.

As a supplement I looked into positive responses to memes relating to each US candidate changed over the course of the election:

Maybe we can use more eclectic polling to some effect, but knowing we will fall foul of the variety clause reducing the reliability of our conclusions.