Intro to machine learning презентация

Содержание

Слайд 2

Recap What is linear regression? Why study linear regression? What

Recap

What is linear regression?
Why study linear regression?
What can we use it

for?
How to perform linear regression?
How to estimate its performance?
T-statistics, F-statistics, p-value, R-squared
Слайд 3

Objectives Extension of linear regressions Interaction Polynomial Classification Logistic Regression Confusion Metric

Objectives

Extension of linear regressions
Interaction
Polynomial
Classification
Logistic Regression
Confusion Metric

Слайд 4

Potential Problems with Linear Regression

Potential Problems with Linear Regression

 

Слайд 5

Linear Models Linear models are relatively simple to describe and

Linear Models

Linear models are relatively simple to describe and implement
They have

advantages over other approaches in terms of interpretation or inference
Слайд 6

Then why do we need extensions? Linear regression makes some

Then why do we need extensions?

Linear regression makes some assumptions that

are easily violated in the real-world
Additive (or independence)
Linear
Слайд 7

Additive: A Noisy Ferrari vs. A Noisy Kia Response: User’

Additive: A Noisy Ferrari vs. A Noisy Kia

Response: User’ Preference in

Car
Predictors: Engine Noise, Car Maker
Слайд 8

Interaction One way of extending this model to allow for

Interaction

One way of extending this model to allow for interaction effects

is to include a third predictor, called an interaction term
Слайд 9

Finding Interaction Terms Domain Knowledge Automatic search over all possible combinations

Finding Interaction Terms

Domain Knowledge
Automatic search over all possible combinations

Слайд 10

Example – Interaction between TV and Radio A linear regression

Example – Interaction between TV and Radio

A linear regression fit to

sales using TV and radio as predictors.
The linear model seems to overestimate sales for instances in which most of the
advertising money was spent exclusively on either TV or radio.
It underestimates sales for instances where the budget was split between the two media.
Слайд 11

Example – Interaction between TV and Radio A linear regression

Example – Interaction between TV and Radio

A linear regression fit to

sales using TV and radio as predictors.
The linear model seems to overestimate sales for instances in which most of the
advertising money was spent exclusively on either TV or radio.
It underestimates sales for instances where the budget was split between the two media.
Слайд 12

Example (2) This suggests some synergy or interaction between the two predictors.

Example (2)

This suggests some synergy or interaction between the two predictors.

Слайд 13

Example (3) After including the interaction term We can interpret

Example (3)

After including the interaction term

We can interpret β3 as the

increase in the effectiveness of TV advertising for a one unit increase in radio advertising (or vice-versa)
Слайд 14

Interactions Hierarchical Principle If we include an interaction term in

Interactions

Hierarchical Principle
If we include an interaction term in our model, we

should also include the main effects
Interaction between quantitative and qualitative variables.
Слайд 15

Interaction between quantitative and qualitative variables -1

Interaction between quantitative and qualitative variables -1

Слайд 16

Interaction between quantitative and qualitative variables -2

Interaction between quantitative and qualitative variables -2

Слайд 17

Non-linearity (1)

Non-linearity (1)

Слайд 18

Non-linearity (2)

Non-linearity (2)

Слайд 19

Non-linearity (3)

Non-linearity (3)

Слайд 20

In General Standard Linear Model Extend linear regression to settings

In General

Standard Linear Model

Extend linear regression to settings in which the

relationship between the predictors and the response is non linear

Polynomial Regression

Слайд 21

Polynomial Regression (1) The Auto data set. For a number

Polynomial Regression (1)

The Auto data set. For a number of cars,

mpg and horsepower are shown. The linear regression fit is shown in orange. The linear regression fit for a model that includes horsepower^2 is shown as a blue curve. The linear regression fit for a model that includes all polynomials of horsepower up to fifth-degree is shown in green
Слайд 22

Polynomial Regression (2) It is still a Linear Model

Polynomial Regression (2)

It is still a Linear Model

Слайд 23

Classification Response variable is discrete or qualitative eye color∈{brown, blue,

Classification

Response variable is discrete or qualitative
eye color∈{brown, blue, green}
email∈ {spam,

ham}
expression ∈ {happy, sad, surprise}
action∈ {walk, run, jog, jump}
Слайд 24

Linear vs. Non-linear A Classification Example in 2-Dimensions, with Three different Flexibility Levels (a) (b) (C)

Linear vs. Non-linear

A Classification Example in 2-Dimensions, with Three different Flexibility

Levels

(a)

(b)

(C)

Слайд 25

Example The annual incomes and monthly credit card balances of

Example

The annual incomes and monthly credit card balances of a number

of individuals. The individuals who defaulted on their credit card payments are shown in orange, and those who did not are shown in blue.
Слайд 26

What if we treat the problem as follows? Instead of

What if we treat the problem as follows?
Instead of coding the

qualitative response and estimating it from data, what if we directly compute probability of a sample belonging to a certain class?

For example, one might predict default = Yes for any individual for whom the above probability > 0.5

Слайд 27

Now, Can we use Linear Regression?

Now, Can we use Linear Regression?

Слайд 28

This is what we want

This is what we want

Слайд 29

Logistic Regression (1)

Logistic Regression (1)

 

Слайд 30

Logistic Regression (2)

Logistic Regression (2)

Слайд 31

Parameter Estimation We need a loss function

Parameter Estimation

We need a loss function

Слайд 32

Logistic Regression Cost Function (1)

Logistic Regression Cost Function (1)

 

 

Слайд 33

Logistic Regression Cost Function (2) Thus we need a different Loss function

Logistic Regression Cost Function (2)

 

 

Thus we need a different Loss function

Слайд 34

Logistic Regression Cost Function (3) We want to have something that looks (behaves) like this

Logistic Regression Cost Function (3)

We want to have something that looks

(behaves) like this
Слайд 35

Logistic Regression Cost Function (4)

Logistic Regression Cost Function (4)

 

Слайд 36

Logistic Regression Cost Function (5)

Logistic Regression Cost Function (5)

 

Слайд 37

Parameter Estimation Now that we have the cost function, how

Parameter Estimation

Now that we have the cost function, how should we

use it to estimate the parameters?
Well, we will try to minimize it
This can be done by using Gradient Descent.
You have learned about GD in the last lab, and today you will use it to estimate the parameters of logistic regression
Слайд 38

Doing Logistic Regression for Our Example

Doing Logistic Regression for Our Example

Слайд 39

Predictions (1) For example, using the coefficient estimates given in

Predictions (1)

For example, using the coefficient estimates given in Table 4.1,

we predict that the default probability for an individual with a balance of $1, 000 is 0.00576.
which is below 1 %. In contrast, the predicted probability of default for an individual with a balance of $2,000 is much higher, and equals 0.586 or 58.6 %.
Слайд 40

Predictions (2)

Predictions (2)

 

Слайд 41

Multiple Logistic Regression

Multiple Logistic Regression

Слайд 42

Interpreting the results of Logistic Regression

Interpreting the results of Logistic Regression

 

Слайд 43

So How to Interpret the Results? Odds Log-Odds

So How to Interpret the Results?

Odds

Log-Odds

Слайд 44

Interpreting the results of Logistic Regression

Interpreting the results of Logistic Regression

 

Слайд 45

Multiclass Classification (1) One versus All One versus One

Multiclass Classification (1)

One versus All
One versus One

Слайд 46

Multiclass Classification (2) One versus All A single multiclass problem

Multiclass Classification (2)

One versus All
A single multiclass problem is transformed into

multiple binary classification problems
We end up with multiple classifiers, each of which is trained to recognize one of the classes – one against all other classes
We make a prediction given a new input by running all the classifiers and picking the classifier that predicts a class with the highest probability
Слайд 47

Multiclass Classification (3) One versus One A classifier is constructed

Multiclass Classification (3)

One versus One
A classifier is constructed for each pair

of classes.
When the model makes a prediction, the class that receives the most votes wins.
This method is generally slower than the one versus many method, especially when there are a large number of classes.
Слайд 48

Classification Metric

Classification Metric

Слайд 49

When Accuracy is Not Good Enough?

When Accuracy is Not Good Enough?

Слайд 50

Some Simple Requirements for Good Classifier Better than average classifier Better than majority classifier

Some Simple Requirements for Good Classifier
Better than average classifier
Better than majority

classifier
Слайд 51

An Example Where We Need More than Just Accuracy – Recall and Precision

An Example Where We Need More than Just Accuracy – Recall

and Precision
Слайд 52

Confusion Matrix Binary Response (Yes/No) True positive: A positive sample

Confusion Matrix

Binary Response (Yes/No)

True positive: A positive sample correctly classified
False positive:

A negative sample classified as positive
True negative: A negative sample correctly classified
False negative: A positive sample classified as negative
Слайд 53

Precision (1) Fraction of positive predictions that are actually positive

Precision (1)

Fraction of positive predictions that are actually positive

Слайд 54

Recall (1) Fraction of positive data predicted to be positive

Recall (1)

Fraction of positive data predicted to be positive

Слайд 55

High Recall Low Precision Highly Optimistic Model Predict almost everything

High Recall Low Precision

Highly Optimistic Model

Predict almost everything as positive

Uses very

low confidence level for positive predictions
Слайд 56

High Precision Low Recall Highly Pessimistic Model Predict almost everything

High Precision Low Recall

Highly Pessimistic Model

Predict almost everything as negative

Uses very

high confidence level for positive predictions
Слайд 57

F-score Weighted Harmonic Mean b/w Precision and Recall

F-score

Weighted Harmonic Mean b/w Precision and Recall

Имя файла: Intro-to-machine-learning.pptx
Количество просмотров: 56
Количество скачиваний: 0