Category: Big Data
Data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy
Big Data Pitfalls
Avoid Simpson’s paradox:
This paradox refers to a phenomena where the association between a pair of variables (X; Y) reverses sign upon conditioning of a third variable, Z regardless of the value taken by Z. If we partition the data into subpopulations, each representing a specic value of the third variable, the phenomena appears as a sign reversal between the associations measured in the disaggregated subpopulations relative to the aggregated data, which describes the population as a whole.
Right ML algorithms usage: use the right approach for machine learning algorithms, find the appropriate algorithm for your specific problems. Ex. If you need a numeric prediction quickly, use decision trees or logistic regression.
Keep in mind the Prisoner’s Dilemma: like in “cigarette manufacturers endorsed the making of laws banning cigarette advertising, understanding that this would reduce ad costs for parties and increase profits across the industry”, so it is with the business strategy and down to big data processing.
Consider Gödel’s Theorem: any system of computation you can construct (numbers theory etc.) that it is true, it cannot be ultimately proved from the rules within that computational construct. The system in a way transcends itself. Thus the way to the strong AI for example.
Keep in mind the exponentially powerful quantum computers of the future. For example build different, resistant cryptographic algorithms against the qubits future powers.
Job Breakthroughs
Startup vs. Larger Company:
Working for a smaller company is that you get to make more of an impact: Working in a larger corporation might have more benefits or a higher salary but a startup is where you can really make a difference and see the influence your work is having on the business. You’re heavily involved in each stage of production and your opinion is more likely to carry weight than at a larger, more structured operation. Decentralization of big companies would be done through tokenization. The shares will be done through ICOs.
Jobs in IT:
In Artificial Intelligence, the Internet of Things, data security, virtual reality and augmented reality, virtual worlds (and virtual assets) and bank-less, free nodes back-boned, Internet of payment. Jobs to see as or related to: big data engineer, Software 2.0 Engineer (maintain Neural Networks that write code), full-stack developer, security engineer, IoT architect and VR/AR engineer and hybrid engineers, with agile mindsets through the teams, with solid technology stacks knowledge that working together are able to bind different ends of the domain spectrum (similarly like DevOps is to the “from Code to Infrastructure” mindset paradigm), runners of decentralized Internet (sustained by Blockchain and other similar technologies yet to come, in order to back-up the Virtual Assets in the Virtual Worlds in the Decentralized Network).
Thus the skills needed to succeed in the IT jobs of tomorrow revolve around security certifications, programming and applications development, proficiency with cloud, decentralized architectures and mobile technologies, and other specialized skill sets giving also way to the hybrid IT roles that bind the business to IT.
Roles grow vertically based on business domain vs. technology stacks. For example: a Solutions Architect has the business domain knowledge but has also a technical background. He will develop complex technology solutions in a specific business domain. Software Architect knows in a deeper way the technology stacks. He will design the architecture of the technical implementation. Technical Lead is one with deeper knowledge of the, or a part of the technology stack. He designs using established patterns, coaches teams into the adopted technologies and unlocks teams in order to succeed in project delivery.
Data Scientists: it is essential for data scientists to work with languages like R, Python, SAS, Hadoop, Netezza in which they apply their knowledge in statistics, mathematics (algebra), matrices (multivariable) calculus. And to have a knowledge in platforms like MapReduce, GridGain, HPCC, Storm, Hive, Pig, Amazon S3.
The user as valuable “in the network” resource, in parallel digital universes (eg. Metaverse). Their actions should be monetized and generate income. We are producing valuable data even now by only navigating on FB, Google and other social networks which the system themselves uses it to become better (the long therm plan is building the future AI systems together). The “Internaut” will be one of the nicest job of the future.