Every new technology seems like magic from the vantage of previous technologies. And if anything, we’ve become spoiled by the sheer pace and magnitude of technological advance, particularly since the advent of digital technologies within the past few decades. From identifying real-time flu trends to auto-piloting cars on busy streets, the latest generation of innovation is tantalizing us with the prospect of knowing anything we want instantly, and even predicting the future.
The reality is less so, or at least riddled with caveats. Until we learn better how to overcome our 4-dimensional reality, the best that computers can do is to quickly digest huge amounts of information about the past and present.
Accurately surveilling the present and predicting the future is not so much a question of technological majesty as it is one of human ingenuity. More specifically, the challenge in data mining and predictive analytics is to accurately model the past and present, to paint a relevant picture using the most minimal data palette possible. From there, the challenge is to focus the analysis toward goal in a simple, efficient manner.
Anyone who’s worked in the information business will have observed that the answer to each question, far from closing the matter, seems to beget more questions. It’s easier to add more categories, more fields and conditions, to load the boat with every item we might conceivably need than to embrace efficiency. And yet that efficiency is what separates clear studies from ambiguous ones, reliable output from the not-so-much.
Occam's Razor was articulated to reduce the likelihood of error in attempting to explain a phenomenon, and while we no longer need to ferret out the causes behind the correlations we uncover, we still have to assure the ontological relevance of the questions we’re investigating. Put another way, our model of the reality we’re trying to observe and measure should correspond to the reality it seeks to represent, and should do so only in the aspects that are relevant to what we’re trying to know.
It’s an artful, creative process that requires analytic skill as well as strategic insight, not to mention a clear goal or question. Better tools can help overcome sparse dimensions and partial data, but no analysis can bridge a flawed model or ontology. Google Flu Trends, at least its first iteration, exemplifies the point. The data you’re modelling must be faithful to the reality it describes, and if that reality is shifting, the algorithm needs to be able to correct for that dynamic.