Intro to machine learning презентация

Ноябрь 19, 2021

Главная
Без категории
Intro to machine learning

Содержание

2. Recap What is linear regression? Why study linear regression? What can we use it for? How
3. Objectives Extension of linear regressions Interaction Polynomial Classification Logistic Regression Confusion Metric
4. Potential Problems with Linear Regression
5. Linear Models Linear models are relatively simple to describe and implement They have advantages over other
6. Then why do we need extensions? Linear regression makes some assumptions that are easily violated in
7. Additive: A Noisy Ferrari vs. A Noisy Kia Response: User’ Preference in Car Predictors: Engine Noise,
8. Interaction One way of extending this model to allow for interaction effects is to include a
9. Finding Interaction Terms Domain Knowledge Automatic search over all possible combinations
10. Example – Interaction between TV and Radio A linear regression fit to sales using TV and
11. Example – Interaction between TV and Radio A linear regression fit to sales using TV and
12. Example (2) This suggests some synergy or interaction between the two predictors.
13. Example (3) After including the interaction term We can interpret β3 as the increase in the
14. Interactions Hierarchical Principle If we include an interaction term in our model, we should also include
15. Interaction between quantitative and qualitative variables -1
16. Interaction between quantitative and qualitative variables -2
17. Non-linearity (1)
18. Non-linearity (2)
19. Non-linearity (3)
20. In General Standard Linear Model Extend linear regression to settings in which the relationship between the
21. Polynomial Regression (1) The Auto data set. For a number of cars, mpg and horsepower are
22. Polynomial Regression (2) It is still a Linear Model
23. Classification Response variable is discrete or qualitative eye color∈{brown, blue, green} email∈ {spam, ham} expression ∈
24. Linear vs. Non-linear A Classification Example in 2-Dimensions, with Three different Flexibility Levels (a) (b) (C)
25. Example The annual incomes and monthly credit card balances of a number of individuals. The individuals
26. What if we treat the problem as follows? Instead of coding the qualitative response and estimating
27. Now, Can we use Linear Regression?
28. This is what we want
29. Logistic Regression (1)
30. Logistic Regression (2)
31. Parameter Estimation We need a loss function
32. Logistic Regression Cost Function (1)
33. Logistic Regression Cost Function (2) Thus we need a different Loss function
34. Logistic Regression Cost Function (3) We want to have something that looks (behaves) like this
35. Logistic Regression Cost Function (4)
36. Logistic Regression Cost Function (5)
37. Parameter Estimation Now that we have the cost function, how should we use it to estimate
38. Doing Logistic Regression for Our Example
39. Predictions (1) For example, using the coefficient estimates given in Table 4.1, we predict that the
40. Predictions (2)
41. Multiple Logistic Regression
42. Interpreting the results of Logistic Regression
43. So How to Interpret the Results? Odds Log-Odds
44. Interpreting the results of Logistic Regression
45. Multiclass Classification (1) One versus All One versus One
46. Multiclass Classification (2) One versus All A single multiclass problem is transformed into multiple binary classification
47. Multiclass Classification (3) One versus One A classifier is constructed for each pair of classes. When
48. Classification Metric
49. When Accuracy is Not Good Enough?
50. Some Simple Requirements for Good Classifier Better than average classifier Better than majority classifier
51. An Example Where We Need More than Just Accuracy – Recall and Precision
52. Confusion Matrix Binary Response (Yes/No) True positive: A positive sample correctly classified False positive: A negative
53. Precision (1) Fraction of positive predictions that are actually positive
54. Recall (1) Fraction of positive data predicted to be positive
55. High Recall Low Precision Highly Optimistic Model Predict almost everything as positive Uses very low confidence
56. High Precision Low Recall Highly Pessimistic Model Predict almost everything as negative Uses very high confidence
57. F-score Weighted Harmonic Mean b/w Precision and Recall
59. Скачать презентацию

Слайд 2

Recap
What is linear regression?
Why study linear regression?
What can we use it

for?
How to perform linear regression?
How to estimate its performance?
T-statistics, F-statistics, p-value, R-squared

Слайд 3

Objectives
Extension of linear regressions
Interaction
Polynomial
Classification
Logistic Regression
Confusion Metric

Слайд 4

Potential Problems with Linear Regression

Слайд 5

Linear Models
Linear models are relatively simple to describe and implement
They have

advantages over other approaches in terms of interpretation or inference

Слайд 6

Then why do we need extensions?
Linear regression makes some assumptions that

are easily violated in the real-world
Additive (or independence)
Linear

Слайд 7

Additive: A Noisy Ferrari vs. A Noisy Kia
Response: User’ Preference in

Car
Predictors: Engine Noise, Car Maker

Слайд 8

Interaction
One way of extending this model to allow for interaction effects

is to include a third predictor, called an interaction term

Слайд 9

Finding Interaction Terms
Domain Knowledge
Automatic search over all possible combinations

Слайд 10

Example – Interaction between TV and Radio
A linear regression fit to

sales using TV and radio as predictors.
The linear model seems to overestimate sales for instances in which most of the
advertising money was spent exclusively on either TV or radio.
It underestimates sales for instances where the budget was split between the two media.

Слайд 11

Example – Interaction between TV and Radio
A linear regression fit to

Слайд 12

Example (2)
This suggests some synergy or interaction between the two predictors.

Слайд 13

Example (3)
After including the interaction term
We can interpret β3 as the

increase in the effectiveness of TV advertising for a one unit increase in radio advertising (or vice-versa)

Слайд 14

Interactions
Hierarchical Principle
If we include an interaction term in our model, we

should also include the main effects
Interaction between quantitative and qualitative variables.

Слайд 15

Interaction between quantitative and qualitative variables -1

Слайд 16

Interaction between quantitative and qualitative variables -2

Слайд 17

Non-linearity (1)

Слайд 18

Non-linearity (2)

Слайд 19

Non-linearity (3)

Слайд 20

In General
Standard Linear Model
Extend linear regression to settings in which the

relationship between the predictors and the response is non linear

Polynomial Regression

Слайд 21

Polynomial Regression (1)
The Auto data set. For a number of cars,

mpg and horsepower are shown. The linear regression fit is shown in orange. The linear regression fit for a model that includes horsepower^2 is shown as a blue curve. The linear regression fit for a model that includes all polynomials of horsepower up to fifth-degree is shown in green

Слайд 22

Polynomial Regression (2)
It is still a Linear Model

Слайд 23

Classification
Response variable is discrete or qualitative
eye color∈{brown, blue, green}
email∈ {spam,

ham}
expression ∈ {happy, sad, surprise}
action∈ {walk, run, jog, jump}

Слайд 24

Linear vs. Non-linear
A Classification Example in 2-Dimensions, with Three different Flexibility

Levels

(a)

(b)

(C)

Слайд 25

Example
The annual incomes and monthly credit card balances of a number

of individuals. The individuals who defaulted on their credit card payments are shown in orange, and those who did not are shown in blue.

Слайд 26

What if we treat the problem as follows?
Instead of coding the

qualitative response and estimating it from data, what if we directly compute probability of a sample belonging to a certain class?

For example, one might predict default = Yes for any individual for whom the above probability > 0.5

Слайд 27

Now, Can we use Linear Regression?

Слайд 28

This is what we want

Слайд 29

Logistic Regression (1)

Слайд 30

Logistic Regression (2)

Слайд 31

Parameter Estimation
We need a loss function

Слайд 32

Logistic Regression Cost Function (1)

Слайд 33

Logistic Regression Cost Function (2)

Thus we need a different Loss function

Слайд 34

Logistic Regression Cost Function (3)
We want to have something that looks

(behaves) like this

Слайд 35

Logistic Regression Cost Function (4)

Слайд 36

Logistic Regression Cost Function (5)

Слайд 37

Parameter Estimation
Now that we have the cost function, how should we

use it to estimate the parameters?
Well, we will try to minimize it
This can be done by using Gradient Descent.
You have learned about GD in the last lab, and today you will use it to estimate the parameters of logistic regression

Слайд 38

Doing Logistic Regression for Our Example

Слайд 39

Predictions (1)
For example, using the coefficient estimates given in Table 4.1,

we predict that the default probability for an individual with a balance of $1, 000 is 0.00576.
which is below 1 %. In contrast, the predicted probability of default for an individual with a balance of $2,000 is much higher, and equals 0.586 or 58.6 %.

Слайд 40

Predictions (2)

Слайд 41

Multiple Logistic Regression

Слайд 42

Interpreting the results of Logistic Regression

Слайд 43

So How to Interpret the Results?
Odds
Log-Odds

Слайд 44

Interpreting the results of Logistic Regression

Слайд 45

Multiclass Classification (1)
One versus All
One versus One

Слайд 46

Multiclass Classification (2)
One versus All
A single multiclass problem is transformed into

multiple binary classification problems
We end up with multiple classifiers, each of which is trained to recognize one of the classes – one against all other classes
We make a prediction given a new input by running all the classifiers and picking the classifier that predicts a class with the highest probability

Слайд 47

Multiclass Classification (3)
One versus One
A classifier is constructed for each pair

of classes.
When the model makes a prediction, the class that receives the most votes wins.
This method is generally slower than the one versus many method, especially when there are a large number of classes.

Слайд 48

Classification Metric

Слайд 49

When Accuracy is Not Good Enough?

Слайд 50

Some Simple Requirements for Good Classifier
Better than average classifier
Better than majority

classifier

Слайд 51

An Example Where We Need More than Just Accuracy – Recall

and Precision

Слайд 52

Confusion Matrix
Binary Response (Yes/No)
True positive: A positive sample correctly classified
False positive:

A negative sample classified as positive
True negative: A negative sample correctly classified
False negative: A positive sample classified as negative

Слайд 53

Precision (1)
Fraction of positive predictions that are actually positive

Слайд 54

Recall (1)
Fraction of positive data predicted to be positive

Слайд 55

High Recall Low Precision
Highly Optimistic Model
Predict almost everything as positive
Uses very

low confidence level for positive predictions

Слайд 56

High Precision Low Recall
Highly Pessimistic Model
Predict almost everything as negative
Uses very

high confidence level for positive predictions

Слайд 57

Intro to machine learning презентация

Содержание

RecapWhat is linear regression?Why study linear regression?What can we use it

ObjectivesExtension of linear regressionsInteractionPolynomialClassificationLogistic RegressionConfusion Metric

Potential Problems with Linear Regression

Linear ModelsLinear models are relatively simple to describe and implementThey have

Then why do we need extensions?Linear regression makes some assumptions that

Additive: A Noisy Ferrari vs. A Noisy KiaResponse: User’ Preference in

InteractionOne way of extending this model to allow for interaction effects

Finding Interaction TermsDomain KnowledgeAutomatic search over all possible combinations

Example – Interaction between TV and RadioA linear regression fit to

Example – Interaction between TV and RadioA linear regression fit to

Example (2)This suggests some synergy or interaction between the two predictors.

Example (3)After including the interaction termWe can interpret β3 as the

InteractionsHierarchical PrincipleIf we include an interaction term in our model, we

Interaction between quantitative and qualitative variables -1

Interaction between quantitative and qualitative variables -2

Non-linearity (1)

Non-linearity (2)

Non-linearity (3)

In GeneralStandard Linear ModelExtend linear regression to settings in which the

Polynomial Regression (1)The Auto data set. For a number of cars,

Polynomial Regression (2)It is still a Linear Model

ClassificationResponse variable is discrete or qualitativeeye color∈{brown, blue, green} email∈ {spam,

Linear vs. Non-linearA Classification Example in 2-Dimensions, with Three different Flexibility

ExampleThe annual incomes and monthly credit card balances of a number

What if we treat the problem as follows?Instead of coding the

Now, Can we use Linear Regression?

This is what we want

Logistic Regression (1)

Logistic Regression (2)

Parameter EstimationWe need a loss function

Logistic Regression Cost Function (1)

Logistic Regression Cost Function (2) Thus we need a different Loss function

Logistic Regression Cost Function (3)We want to have something that looks

Logistic Regression Cost Function (4)

Logistic Regression Cost Function (5)

Parameter EstimationNow that we have the cost function, how should we

Doing Logistic Regression for Our Example

Predictions (1)For example, using the coefficient estimates given in Table 4.1,

Predictions (2)

Multiple Logistic Regression

Interpreting the results of Logistic Regression

So How to Interpret the Results?OddsLog-Odds

Interpreting the results of Logistic Regression

Multiclass Classification (1)One versus AllOne versus One

Multiclass Classification (2)One versus AllA single multiclass problem is transformed into

Multiclass Classification (3)One versus OneA classifier is constructed for each pair

Classification Metric

When Accuracy is Not Good Enough?

Some Simple Requirements for Good ClassifierBetter than average classifierBetter than majority

An Example Where We Need More than Just Accuracy – Recall

Confusion MatrixBinary Response (Yes/No)True positive: A positive sample correctly classifiedFalse positive:

Precision (1)Fraction of positive predictions that are actually positive

Recall (1) Fraction of positive data predicted to be positive

High Recall Low PrecisionHighly Optimistic ModelPredict almost everything as positiveUses very

High Precision Low RecallHighly Pessimistic ModelPredict almost everything as negativeUses very

F-scoreWeighted Harmonic Mean b/w Precision and Recall

Похожие презентации

Recap
What is linear regression?
Why study linear regression?
What can we use it

Objectives
Extension of linear regressions
Interaction
Polynomial
Classification
Logistic Regression
Confusion Metric

Linear Models
Linear models are relatively simple to describe and implement
They have

Then why do we need extensions?
Linear regression makes some assumptions that

Additive: A Noisy Ferrari vs. A Noisy Kia
Response: User’ Preference in

Interaction
One way of extending this model to allow for interaction effects

Finding Interaction Terms
Domain Knowledge
Automatic search over all possible combinations

Example – Interaction between TV and Radio
A linear regression fit to

Example – Interaction between TV and Radio
A linear regression fit to

Example (2)
This suggests some synergy or interaction between the two predictors.

Example (3)
After including the interaction term
We can interpret β3 as the

Interactions
Hierarchical Principle
If we include an interaction term in our model, we

In General
Standard Linear Model
Extend linear regression to settings in which the

Polynomial Regression (1)
The Auto data set. For a number of cars,

Polynomial Regression (2)
It is still a Linear Model

Classification
Response variable is discrete or qualitative
eye color∈{brown, blue, green}
email∈ {spam,

Linear vs. Non-linear
A Classification Example in 2-Dimensions, with Three different Flexibility

Example
The annual incomes and monthly credit card balances of a number

What if we treat the problem as follows?
Instead of coding the

Parameter Estimation
We need a loss function

Logistic Regression Cost Function (2)

Thus we need a different Loss function

Logistic Regression Cost Function (3)
We want to have something that looks

Parameter Estimation
Now that we have the cost function, how should we

Predictions (1)
For example, using the coefficient estimates given in Table 4.1,

So How to Interpret the Results?
Odds
Log-Odds

Multiclass Classification (1)
One versus All
One versus One

Multiclass Classification (2)
One versus All
A single multiclass problem is transformed into

Multiclass Classification (3)
One versus One
A classifier is constructed for each pair

Some Simple Requirements for Good Classifier
Better than average classifier
Better than majority

Confusion Matrix
Binary Response (Yes/No)
True positive: A positive sample correctly classified
False positive:

Precision (1)
Fraction of positive predictions that are actually positive

Recall (1)
Fraction of positive data predicted to be positive

High Recall Low Precision
Highly Optimistic Model
Predict almost everything as positive
Uses very

High Precision Low Recall
Highly Pessimistic Model
Predict almost everything as negative
Uses very

F-score
Weighted Harmonic Mean b/w Precision and Recall