Prerequisites for data science:
We need to know the resources to become a data scientist requires a candidate to possess the skills In various fields like software development, data base query languages, machine learning, programming, mathematics, statistics, data visualization etc.
Statistics needed for Data Science :
Statistics is a broad field with applications in many industries. It can be defined as the study of the collection, analysis, presentation, Interpretation and organisation of data. Data analysis requires descriptive statistics and probability theory at a minimum level.
Here are the three steps to learning the statistics and probability required for Data science
- Core Statistics concept
- Bayesian Thinking
- Introduction to Statistical machine learning
Core statistics concept :
We should know the statistics for data science and how it will be used. Here some examples are explained of real analysis that we need to implement .
EX- 1. Experimental Design : The company is rolling out a new product line but it sells the products through offline retail stores then we need to design A/B test that controls for differences across geographies.
- Regression Modelling : The company needs to better forecast the demand of individual product lines in its stores. Under-stocking and over-stocking are both expensive. Then it should be considered building a series of regularised regression models.
- Data Transformation : we have multiple machine learning model candidates that are testing, few of them assume specific probability distributions of input data, and we need to be able to identify them and transform the input data.
One of the logical debates in statistics is between Bayesian’s. The Bayesian is more relevant when learning statistics for data science.
In a shorter way we can say that frequentists use probability only to model sampling processes. This means they only assign probabilities to describe data they have already collected.
Bayesian’s use probability to model sampling process and to measure uncertainty before collecting data.
Introduction to Statistical machine learning:
If we want to learn statistics for data science then it is the best way after learning core concepts and Bayesian thinking. The statistics and machine learning fields are closely related, and statistical machine learning is the main approach to modern machine learning.
Math needed for Data science
The amount of math needed depends on the role of a data scientist. Data science is simply the expand version of statistics and mathematics, combined with programming and business logic.
Here are the three steps to learn math for data science:
- Linear Algebra
- Numerical Analysis
- Calculus for data science
Many machine learning concepts are tied to linear algebra .for Example Principal component analysis requires eigenvalues and regression requires matrix multiplication. Most machine learning applications deals with high dimensional data .this type of is represented by matrices.
Numerical analysis is about determining whether an algorithm will work in practice. Computers don’t store numbers exactly. Numerical analysis tells you exactly where you went wrong.
Calculus for Data science:
Calculus is the mathematical study of continuous and is important for several machine learning applications. It has two major concepts, Differential calculus and integral calculus. These two branches are related each other by the fundamental theorem of calculus.