Friday, February 8, 2013

Beware Big Errors in Big Data- Taleb

This excerpt comes via Wired.com. The whole article can be found here.

But beyond that, big data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal). It’s a property of sampling: In real life there is no cherry-picking, but on the researcher’s computer, there is. Large deviations are likely to be bogus.



This is the tragedy of big data: The more variables, the more correlations that can show significance. Falsity also grows faster than information; it is nonlinear (convex) with respect to data (this convexity in fact resembles that of a financial option payoff). Noise is antifragile. Source: N.N. Taleb

Well this just made me question my entire effort of segmenting some decision making to automated informational sources and results. Just goes to show you that any model should be validated by other observations.


No comments:

Post a Comment