Big Data Pitfalls

Avoid Simpson’s paradox:
This paradox refers to a phenomena where the association between a pair of variables (X; Y) reverses sign upon conditioning of a third variable, Z regardless of the value taken by Z. If we partition the data into subpopulations, each representing a specic value of the third variable, the phenomena appears as a sign reversal between the associations measured in the disaggregated subpopulations relative to the aggregated data, which describes the population as a whole.

Right ML algorithms usage: use the right approach for machine learning algorithms, find the appropriate algorithm for your specific problems. Ex. If you need a numeric prediction quickly, use decision trees or logistic regression.

Keep in mind the Prisoner’s Dilemma: like in “cigarette manufacturers endorsed the making of laws banning cigarette advertising, understanding that this would reduce ad costs for parties and increase profits across the industry”, so it is with the business strategy and down to big data processing.

Consider Gödel’s Theorem: any system of computation you can construct (numbers theory etc.) that it is true, it cannot be ultimately proved from the rules within that computational construct. The system in a way transcends itself. Thus the way to the strong AI for example.

Keep in mind the exponentially powerful quantum computers of the future. For example build different, resistant cryptographic algorithms against the qubits future powers.

Software Security Vulnerabilities

Cause of security vulnerabilities, the possibility of being exploited, the degree of harm and the difficulty to solve.

1. Input Validation and Representation
Input validation and representation problems are usually caused by special characters, encodings, and numerical representations. Such problems occur as a result of input trust. These problems include: buffer overflow, cross-site scripting, SQL injection, command injection and so on.
2. API Abuse
The API is a convention between the caller and the callee, and most API abuses are caused by the caller not understanding the purpose of the convention. Security problems can also arise when the API is not used properly.
3. Security Features
This category contains vulnerabilities in authentication, access control, confidentiality, password usage, and privilege management.
4. Memory Management
Memory management is a common type of vulnerability associated with memory operations, including memory leaks, post-release use, double-release and so on. This type of vulnerability usually leads to system performance degradation, program crashes and a common type of flaws in C / C + + language.
5. Time and State
Distributed computing is time and state dependent. The interaction between threads and processes and the order in which tasks are executed are often determined by shared state, such as semaphores, variables, file systems and so on. The vulnerabilities associated with distributed computing include race conditions, blocking misuse and so on.
6. Error and Exception Handling Errors
This type of vulnerability is related to error and exception handling, and the most common type of vulnerability is that there is no proper processing mechanism (or errors are not processed), resulting in unexpected termination of program. Another vulnerability is that the error generated provides potential attacker with too much information.
7. Code Quality
Poor code quality can lead to unpredictable behavior. For the attacker, the poor code enables them to threaten the system in unexpected ways. Common types of vulnerabilities include dead code, null pointer dereferences and resource leak.
8. Encapsulation and hidden defects
Reasonable encapsulation means that the distinction between verified and unverified data, distinction between data of different users, or distinguish data that is visible or invisible to users. Common vulnerabilities include hidden fields, information leakage, cross-site request forgery and so on.
9. Flaws in Code Runtime Environment
These types of vulnerability is external to the source code, such as runtime configuration issues, sensitive information management issues and so on, which are critical to the product security.

The first eight types of vulnerabilities are related to security flaws in the source code. They can be the target of malicious attacks. Once exploited, they can cause serious consequences such as information leak, authorization lift and command execution. The last type of vulnerability describes security concerns that are external to the actual code. They are likely to cause abnormal operation of the software, data loss and other serious problems. (

10. With the advances of Quantum Computer technology there is more and more concern about the obsolete soon to be of current cryptographic security algorithms. A quantum computer will be able to break keys in matter of days. Hackers are already preparing to gain that calculus power and even today they secure certificates and keys. In danger of that Quantum era is also the actual Blockchains, in which the keys use the same technology of todays.