Big Data Pitfalls

Avoid Simpson’s paradox:
This paradox refers to a phenomena where the association between a pair of variables (X; Y) reverses sign upon conditioning of a third variable, Z regardless of the value taken by Z. If we partition the data into subpopulations, each representing a specic value of the third variable, the phenomena appears as a sign reversal between the associations measured in the disaggregated subpopulations relative to the aggregated data, which describes the population as a whole.

Right ML algorithms usage: use the right approach for machine learning algorithms, find the appropriate algorithm for your specific problems. Ex. If you need a numeric prediction quickly, use decision trees or logistic regression.

Keep in mind the Prisoner’s Dilemma: like in “cigarette manufacturers endorsed the making of laws banning cigarette advertising, understanding that this would reduce ad costs for parties and increase profits across the industry”, so it is with the business strategy and down to big data processing.

Consider Gödel’s Theorem: any system of computation you can construct (numbers theory etc.) that it is true, it cannot be ultimately proved from the rules within that computational construct. The system in a way transcends itself. Thus the way to the strong AI for example.

Keep in mind the exponentially powerful quantum computers of the future. For example build different, resistant cryptographic algorithms against the qubits future powers.