# ML, Statistics and Mathematics

Data Science Asked by ranit.b on December 20, 2020

I have just started getting my hands wet in ML and every time I try delving deeper into the concepts/code, I face the challenges of the mathematics and its cryptic notations.
Coming from a Computer Science background, I do understand bit of them but majority goes tangent.

I try and really want to understand them but somehow get confused and leave it everytime.

I would recommend a TOP-DOWN learning path:

1. get a first grasp about what algorithms types there are based on possible use cases (classification, regression, clustering, etc); this way, you know the WHAT CAN I SOLVE WITH THIS
2. for the algorithms you are interested in (a basic one could be a linear regression trained via gradient descent optimizer), you can get a first feeling using libraries like scikit-learn which wrap all the math in between, but give you results which you can quickly check and play with --> HOW CAN I SOLVE IT
3. after you have played around it, you can have a deeper look at how the algorithms work, with the linear algebra, statistics and calculus concepts you need to really understand them (basically, the math fomulae you said) --> HOW IT WORKS

Good sources:

• Python Machine Learning book, by Sebastian Raschka (good balance between theory and practice)
• Jason Brownlee blog and books (very applied use cases)
• scikit-learn documentation, which includes the math used in their code

Answered by German C M on December 20, 2020

It is quite true that papers or books use notations that sometimes seem obvious to people who are used to dealing with the mathematical aspects, but are meaningless for the others. Ways of understanding the math include:

• Following theoretical courses or trainings
• Asking people on forums such as this one, or Cross Validated for stats formulae
• Getting it by yourself upon re-reading parts of the paper/book you didn't get at the first time

There are some notations/conventions that are implicitly accepted in data science / machine learning papers, such as:

• Using $$X$$ as input, $$y$$ as output, $$theta$$ as model parameters
• Using $$hat{y}$$ for the estimator of the true $$y$$
• Assuming that vectors are column vectors

The list would be too long to include here.

Regarding the example above, what we face is a constrained optimization.

The $$max$$ statement means that we are looking for a maximum value of the expression that follows. What is below (namely, the $$Delta_{ij}$$ values) the $$max$$ is the list of "free" parameters that change the value of the expression.

The $$max$$ statement is prefixed by $$arg$$, which means that we do not have interest in the expression's maximum value, but rather in the $$Delta_{ij}$$ set that leads to that value.

Then we face a $$s.t.$$ statement, because this is no ordinary optimization, we also have to respect the several constraints expressed after $$s.t.$$. Those can be inequations, equations, belonging constraints, etc., either explicit ($$Delta_{ij} > 0$$) or more implicit.

Answered by Romain Reboulleau on December 20, 2020

## Related Questions

### Latent Space of VAE

1  Asked on September 5, 2021 by elena

### MAE and MSE are Nan for regression with Neural Networks?

2  Asked on September 4, 2021 by kahina

### Understanding policy gradient theorem – What does it mean to take gradients of reward wrt policy parameters?

1  Asked on September 4, 2021 by milominderbinder

### Is “adding the predictions to the real data for new training and prediction” a good idea for LSTM?

1  Asked on September 4, 2021 by wdr

### No gradients provided for any variable

2  Asked on September 4, 2021 by fsymao

### Sklearn LocalOutlierFactor contamination parameter

2  Asked on September 4, 2021 by sandyp

### How do I assess whether two time series change together?

2  Asked on September 4, 2021

### How can I find synonyms and antonyms for a word?

1  Asked on September 4, 2021

### Predicting parameters of simple configured trajectories using RNN

2  Asked on September 4, 2021 by simon-q

### What to input into machine learning algorithm for image recogniton?

2  Asked on September 4, 2021 by dhruv-kapu

### How to install tensorflow-gpu?

3  Asked on September 4, 2021 by sajied-shah-yousuf

### How do I interpret the output of linear regression model in R?

3  Asked on September 4, 2021 by jayden-rice

### Is Label Encoding with arbitrary numbers ever useful at all?

2  Asked on September 4, 2021 by uchuustranger

### Cable angle measurement (rotation)

1  Asked on September 4, 2021

### Transfer Learning for CNNs and Batch Norm Layers

1  Asked on September 4, 2021 by jack-armstrong

### Positive/negative training sample imbalance in multi-label image classifiers

0  Asked on September 4, 2021 by trzy

### word2vec: usefulness of context vectors in classification

3  Asked on September 4, 2021 by ingolifs

### LSTM Sequential Model question re: ValueError: non-broadcastable output operand with shape doesn’t match broadcast shape

1  Asked on September 4, 2021 by brohjoe

### Skip Gram Negative Sampling with Logistics Regression

0  Asked on September 4, 2021 by linear-algebra-fans

### Using KNN to categorise inventory (physical stock items) – is it the best way?

2  Asked on September 4, 2021 by tristar8