2022-01-11.md
Machine Learning
- Data collected from a real world process in form of input output pairs.
- ML algorithm gives an approximation to which gives output for input. is the target function
- is a family of functions from the Hypothesis Space and the function is chosen by finding the argmin of some cost function.
Structure of Data
- Most data can be represented as a table of input columns and an output column.
- Data can be of
- Continuous Variable
- We want a PDF
- # parameters depends on PDF
- Categorical Variable
- We want the probability distribution for the variable
- There is no obvious order on them
- Ordinal Variable
But there are an infinite number of input parameters, as the input may be Continuous.
Conditional Probability
- We want to estimate the Conditional Probability
- This is called a discriminative model.
- In the binary categorical output, it is a bernoulli distribution.
Bayes Theorem
Conditional Independance Assumption
Given an output Y, all input parametersare independant of each other, i.e
By modelling as a Gaussian function, we have a total of parameters for every .
In the case of the binary output, there are a total of parameters only, which reduces the complexity of the problem.
This is called the Naive Bayes algorithm.
Since we are looking at the joint probability distribution, and if we know it, we can generate a sample. Hence this model is also known as a generative sample.
Defining without Conditional Independance
We can also model the entire problem as an N dimensional Gaussian without the [2022-01-11.md].
For this, we need an N dimensional and a covarience matrix which has
So here, we have a total of . Hence, [2022-01-11.md] makes the parameter space linear in .
Basically, constraining the Hypothesis space reduces the complexity of the ML problem.
Logistic Approach
We can further simplify the model further, by considering
This model has only parameters.
Equation of decision boundary
P(y_1|X) = P(y_2|X)
That is, our discrimination boundary is a plane. If the data cannot be cut by the plane, then we cannot use this assumption.
Constraint defined on our hypothesis set is known as a Language Bias, and it is our choice which Language Bias to pick.