Intro to machine learning презентация

Содержание

Слайд 2

Recap

What is linear regression?
Why study linear regression?
What can we use it for?
How to

perform linear regression?
How to estimate its performance?
T-statistics, F-statistics, p-value, R-squared

Слайд 3

Objectives

Extension of linear regressions
Interaction
Polynomial
Classification
Logistic Regression
Confusion Metric

Слайд 4

Potential Problems with Linear Regression

 

Слайд 5

Linear Models

Linear models are relatively simple to describe and implement
They have advantages over

other approaches in terms of interpretation or inference

Слайд 6

Then why do we need extensions?

Linear regression makes some assumptions that are easily

violated in the real-world
Additive (or independence)
Linear

Слайд 7

Additive: A Noisy Ferrari vs. A Noisy Kia

Response: User’ Preference in Car
Predictors: Engine

Noise, Car Maker

Слайд 8

Interaction

One way of extending this model to allow for interaction effects is to

include a third predictor, called an interaction term

Слайд 9

Finding Interaction Terms

Domain Knowledge
Automatic search over all possible combinations

Слайд 10

Example – Interaction between TV and Radio

A linear regression fit to sales using

TV and radio as predictors.
The linear model seems to overestimate sales for instances in which most of the
advertising money was spent exclusively on either TV or radio.
It underestimates sales for instances where the budget was split between the two media.

Слайд 11

Example – Interaction between TV and Radio

A linear regression fit to sales using

TV and radio as predictors.
The linear model seems to overestimate sales for instances in which most of the
advertising money was spent exclusively on either TV or radio.
It underestimates sales for instances where the budget was split between the two media.

Слайд 12

Example (2)

This suggests some synergy or interaction between the two predictors.

Слайд 13

Example (3)

After including the interaction term

We can interpret β3 as the increase in

the effectiveness of TV advertising for a one unit increase in radio advertising (or vice-versa)

Слайд 14

Interactions

Hierarchical Principle
If we include an interaction term in our model, we should also

include the main effects
Interaction between quantitative and qualitative variables.

Слайд 15

Interaction between quantitative and qualitative variables -1

Слайд 16

Interaction between quantitative and qualitative variables -2

Слайд 17

Non-linearity (1)

Слайд 18

Non-linearity (2)

Слайд 19

Non-linearity (3)

Слайд 20

In General

Standard Linear Model

Extend linear regression to settings in which the relationship between

the predictors and the response is non linear

Polynomial Regression

Слайд 21

Polynomial Regression (1)

The Auto data set. For a number of cars, mpg and

horsepower are shown. The linear regression fit is shown in orange. The linear regression fit for a model that includes horsepower^2 is shown as a blue curve. The linear regression fit for a model that includes all polynomials of horsepower up to fifth-degree is shown in green

Слайд 22

Polynomial Regression (2)

It is still a Linear Model

Слайд 23

Classification

Response variable is discrete or qualitative
eye color∈{brown, blue, green}
email∈ {spam, ham}
expression

∈ {happy, sad, surprise}
action∈ {walk, run, jog, jump}

Слайд 24

Linear vs. Non-linear

A Classification Example in 2-Dimensions, with Three different Flexibility Levels

(a)

(b)

(C)

Слайд 25

Example

The annual incomes and monthly credit card balances of a number of individuals.

The individuals who defaulted on their credit card payments are shown in orange, and those who did not are shown in blue.

Слайд 26

What if we treat the problem as follows?
Instead of coding the qualitative response

and estimating it from data, what if we directly compute probability of a sample belonging to a certain class?

For example, one might predict default = Yes for any individual for whom the above probability > 0.5

Слайд 27

Now, Can we use Linear Regression?

Слайд 28

This is what we want

Слайд 29

Logistic Regression (1)

 

Слайд 30

Logistic Regression (2)

Слайд 31

Parameter Estimation

We need a loss function

Слайд 32

Logistic Regression Cost Function (1)

 

 

Слайд 33

Logistic Regression Cost Function (2)

 

 

Thus we need a different Loss function

Слайд 34

Logistic Regression Cost Function (3)

We want to have something that looks (behaves) like

this

Слайд 35

Logistic Regression Cost Function (4)

 

Слайд 36

Logistic Regression Cost Function (5)

 

Слайд 37

Parameter Estimation

Now that we have the cost function, how should we use it

to estimate the parameters?
Well, we will try to minimize it
This can be done by using Gradient Descent.
You have learned about GD in the last lab, and today you will use it to estimate the parameters of logistic regression

Слайд 38

Doing Logistic Regression for Our Example

Слайд 39

Predictions (1)

For example, using the coefficient estimates given in Table 4.1, we predict

that the default probability for an individual with a balance of $1, 000 is 0.00576.
which is below 1 %. In contrast, the predicted probability of default for an individual with a balance of $2,000 is much higher, and equals 0.586 or 58.6 %.

Слайд 40

Predictions (2)

 

Слайд 41

Multiple Logistic Regression

Слайд 42

Interpreting the results of Logistic Regression

 

Слайд 43

So How to Interpret the Results?

Odds

Log-Odds

Слайд 44

Interpreting the results of Logistic Regression

 

Слайд 45

Multiclass Classification (1)

One versus All
One versus One

Слайд 46

Multiclass Classification (2)

One versus All
A single multiclass problem is transformed into multiple binary

classification problems
We end up with multiple classifiers, each of which is trained to recognize one of the classes – one against all other classes
We make a prediction given a new input by running all the classifiers and picking the classifier that predicts a class with the highest probability

Слайд 47

Multiclass Classification (3)

One versus One
A classifier is constructed for each pair of classes.


When the model makes a prediction, the class that receives the most votes wins.
This method is generally slower than the one versus many method, especially when there are a large number of classes.

Слайд 48

Classification Metric

Слайд 49

When Accuracy is Not Good Enough?

Слайд 50

Some Simple Requirements for Good Classifier
Better than average classifier
Better than majority classifier

Слайд 51

An Example Where We Need More than Just Accuracy – Recall and Precision

Слайд 52

Confusion Matrix

Binary Response (Yes/No)

True positive: A positive sample correctly classified
False positive: A negative

sample classified as positive
True negative: A negative sample correctly classified
False negative: A positive sample classified as negative

Слайд 53

Precision (1)

Fraction of positive predictions that are actually positive

Слайд 54

Recall (1)

Fraction of positive data predicted to be positive

Слайд 55

High Recall Low Precision

Highly Optimistic Model

Predict almost everything as positive

Uses very low confidence

level for positive predictions

Слайд 56

High Precision Low Recall

Highly Pessimistic Model

Predict almost everything as negative

Uses very high confidence

level for positive predictions

Слайд 57

F-score

Weighted Harmonic Mean b/w Precision and Recall

Имя файла: Intro-to-machine-learning.pptx
Количество просмотров: 47
Количество скачиваний: 0