Trying to Understand Data
The hottest field in Software now is Data Science. There are plenty of courses on this, and I am taking one now, sponsored by Microsoft on edX.
Right away, I can see some problems – they have dragged in data from everywhere, dumped it into one big pile (a dataset) and then try to make sense of it. They should be asking themselves, for each piece of data, “What kind of data is this? Where did it come from? How good is it?”
Digital data is just a collection of bits (ones and zeros) that represent something. Your name, for example is a sequence of bits, formed into digital words – each word representing a letter in a particular alphabet, often unicode.
Your name is important, to you and to many other people – but this importance is lost when it is turned into data.
It doesn’t have to be – this could easily be added to the bits that form your name, along with a lot of other important information, that everyone would like to know about – but in most cases this information is lost when it is turned into data. And a great deal of work is done, by data scientists, trying to reconstruct it.
And this work goes into databases containing information about everyone – detailed information, obtained without your knowledge or consent – maintained in many secret places.
Sometimes this information is leaked, because of poor security – but if the leaker is caught, he is treated as the worst criminal in the world. While the people he got the information from – who sometimes used torture to obtain their information – are not bothered at all – they were just doing their job.
Much of Data Science is Prescriptive Analytics – that examines data, collected from the past – and predicts what will happen in the future. Sometimes successfully, and sometimes not. But its consumers tend to overlook its failures in their eagerness to profit from its successes – in predicting stock prices, for example.
This data (about the market) can be manipulated, and has been for a long time. This is what caused the financial collapse of 2008 – that cost many people a lot of money, but not those who caused it – who got away scot-free.
I saw this happening back in the Nineties, when I was working in Silicon Valley. Everyone was playing the market – speculating in computer stocks. This provided huge amounts of money for any company that could claim to be a computer company. This produced a boom that I marveled at, at the time – but no one else did. They have been trained to notice nothing. This was followed by a bust, as always – that forced many people to leave the Valley – including me.
The Information Economy existed, based on Big Data – but no one understood it. And they still don’t understand it.
I am trying to, by taking this course – but I am here to tell you that much of Data Science is black magic. They tell what formulas to use, based on past experience – and you use them. Knowing that the company who employs you, and pays you well, won’t know the difference.
If your recommendations don’t work – you can just say “Too bad!” And move on to another company that wants your black magic.
Businessmen are idiots – and simply follow the crowd.