IDC410 Machine Learning

2022-01-11.md

Machine Learning

  1. Data collected from a real world process in form of input output pairs.
  2. ML algorithm gives an approximation to $f$ which gives output for input. $f$ is the target function
  3. $\hat f$ is a family of functions from the Hypothesis Space and the function is chosen by finding the argmin of some cost function.

Structure of Data

But there are an infinite number of input parameters, as the input may be Continuous.

Conditional Probability

Bayes Theorem

\[P(Y\|X) = \frac{P(X\|Y)P(Y)}{P(X)}\]

Conditional Independance Assumption

Given an output Y, all input parametersare independant of each other, i.e

$P(X|y_i) = \prod_j P(x_j|y_i)$

By modelling $P(x_j|y_i)$ as a Gaussian function, we have a total of $2n$ parameters for every $y_i$. In the case of the binary output, there are a total of $2n + 2n + 1$ parameters only, which reduces the complexity of the problem. $P(x_j|y_1) = 2n$ $P(x_j|y_2) = 2n$ $P(y_1) = \theta,\ P(y_2)=1-\theta$

This is called the Naive Bayes algorithm.

Since we are looking at the joint probability distribution, and if we know it, we can generate a sample. Hence this model is also known as a generative sample.

Defining without Conditional Independance

We can also model the entire problem as an N dimensional Gaussian without the [#Conditional Independance Assumption].

For this, we need an N dimensional $\vec{\mu}$ and a covarience matrix which has $\mathcal{O}(n^2)$

So here, we have a total of $\mathcal{O}(n^2)$. Hence, [#Conditional Independance Assumption] makes the parameter space linear in $n$.

Basically, constraining the Hypothesis space reduces the complexity of the ML problem.

Logistic Approach

We can further simplify the model further, by considering

$P(y_i|X) = \frac{1}{1+\exp(\sum\beta_i x_i)}$

This model has only $n+1$ parameters.

Equation of decision boundary

P(y_1|X) = P(y_2|X) $\implies 0 = \sum \beta_i x_i$

That is, our discrimination boundary is a plane. If the data cannot be cut by the plane, then we cannot use this assumption.

Constraint defined on our hypothesis set is known as a Language Bias, and it is our choice which Language Bias to pick.