logreg

download logreg

of 26

Transcript of logreg

  • 8/13/2019 logreg

    1/26

    Binary Choice Models1. Binary Dependent Variables

    2. Probit and Logit Regression

    3. Maximum Likelihood estimation

    4. Estimation Binary Models in Eviews

    5. Measures of Goodness of Fit

    6. Other Limited Dependent Variable Models

    7. Exercise

  • 8/13/2019 logreg

    2/26

    1 Binary Dependent Variables

    The variable of interest Y is binary. The two possible outcomes are

    labeled as 0 and 1. We want to model Y as a function of explanatory

    variables X= (X1, . . . , X p).

    Example: Y=employed (1) or unemployed (0); X=educational level,

    age, marital status, ...

    Example: Y=expansion (1) or recession (0); X=unemployment level,

    inflation, ...

  • 8/13/2019 logreg

    3/26

    Can we still use linear regression?

    Then

    E[Y|X] =0+ 1X1+ . . . pXp

    and the OLS fitted values are given by

    Y = 0+ 1X1+ . . .pXp.

    ! Problem: the left hand side of the above equations takes values

    between 0 and 1, while the right hand side may take any value on the

    real line.

    Note that

    E[Y|X] = 0.P(Y = 0|X) + 1.P(Y = 1|X) =P(Y = 1|X)

    The conditional expected values are conditional probabilities

  • 8/13/2019 logreg

    4/26

    0 1 2 3 4 5 6 7 8 9 100.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    x

    y

    data cloud

    Linear Fit

    Sshaped fit

  • 8/13/2019 logreg

    5/26

    2 Probit and Logit Regression

    Binary regression model:

    P(Y = 1|X) =F(0+ 1X1+ . . . pXp)

    with

    (a) F(u) = 1

    1 + exp(u) Logit

    (b) F(u) = (u), standard normal cumulative distribution function

    Probit

    (c) ...

    0< F(u)< 1 and F increasing

  • 8/13/2019 logreg

    6/26

    0 1 2 3 4 5 6 7 8 9 100.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    x

    y

    data cloud

    Probit Logit

    Difference is small;Probit function is steeper.

  • 8/13/2019 logreg

    7/26

  • 8/13/2019 logreg

    8/26

    Prediction

    For an observation xi= (xi1, . . . , xip) we predict the probability of

    success as

    P(Y = 1|X=xi) =F(0+ 1xi1+ . . .pxip).

    Set yi= 1 if P(Y = 1|X=xi)>0.5 and zero otherwise.

    (Other cut-off values than 0.5=50% are sometimes taken)

  • 8/13/2019 logreg

    9/26

    3 Maximum Likelihood Estimation (MLE)

    General principle: let L() be the likelihood or joint density of the

    observations y1, . . . , yn, depending on an unknown parameter

    L() =n

    i=1

    f(yi, )

    (assumes independent observations)

    Then the maximum likelihood estimator is the maximizing L():

    =argmax

    log L() =argmax

    n

    i=1

    log f(yi, )

    Denote Lmax =L().

  • 8/13/2019 logreg

    10/26

    MLE for Bernoulli Variables

    Let yi be the outcome of a 0/1 (failure/success) experiment, with p

    be the probability of success. Then f(1, p) =p and f(0, p) = 1 p,

    hence

    f(yi, p) =pyi(1 p)1yi

    The MLE p maximizes

    n

    i=1

    {yilog(p) + (1 yi) log(1 p)}.

    It is not difficult to check that p= 1

    nn

    i=1 yi, the percentage ofsuccesses in the sample.

  • 8/13/2019 logreg

    11/26

    MLE for Probit Model

    We will condition on the explanatory variables; hence keep them fixed.

    f(yi, pi) =pyii (1 pi)

    1yi with pi= (0+ 1Xi1+ . . . pXip)

    The MLE = (0,1 . . . ,p) maximizes

    n

    i=1

    {yilog (0+1Xi1+. . . pXip)+(1yi) log(1(0+1Xi1+. . . pXip))}.

    The MLE needs to be computed using a numerical algorithm on thecomputer. (Similar for Logit model)

  • 8/13/2019 logreg

    12/26

    If the model is correctly specified, then

    1. MLE is consistent and asymptotically normal.

    2. MLE is asymptotically the most precise estimator, hence

    efficient.

    3. Inference (testing, confidence intervals) can be done.

    If the model is misspecified, then the MLE may loose the above

    properties.

  • 8/13/2019 logreg

    13/26

  • 8/13/2019 logreg

    14/26

    (1) We first regress deny on a constant, black and pi rat. In

    Eviews, we specify within the equation specification, Estimation

    Settings: Method: BINARY-Binary Choice and select logit.

  • 8/13/2019 logreg

    15/26

    Both explanatory variables are highly significant. They have a

    positive effect on the probability of deny, as expected. They are also

    jointly highly significant (LR stat =152, P

  • 8/13/2019 logreg

    16/26

    Predictive accuracy: (Expectation-prediction table)

    88% is correctly classified, with a sensitivity of only 4.2% and a

    specificity of 99.7%. The gain is only 0.25 percentage points w.r.t. a

    majority forecast (i.e. all applications are accepted).

    (2) Repeat the analysis, now with all predictor variables.

  • 8/13/2019 logreg

    17/26

    5 Measures of Fit

    Pseudo R-squared

    Compare the value of the likelihood of the full model with an empty

    model:

    M(full): P(Y = 1|X) =F(0+ 1X1+ . . . + pXp)

    M(empty): P(Y = 1|X) =F(0)

    Pseudo R-squared=1 log Lmax(Full)

    log Lmax(Empty)

    (also called McFadden R-squared)

  • 8/13/2019 logreg

    18/26

    Likelihood ratio test

    The Likelihood Ratio (LR) statistic for H0 :1 =. . .= p = 0 is

    LR= 2{log Lmax(Full) log Lmax(Empty)}

    We reject H0 for large values of LR.

    The LR statistics can be used to compare any pair of two nested

    model. Suppose that M1 is a submodel of M2, and we want to test

    H0 :M1 =M2. Then, under H0:

    LR= 2{log Lmax(M2) log Lmax(M1)} d 2k,

    where k is the number of restrictions (i.e. the difference in number of

    parameters between M2 and M1).

  • 8/13/2019 logreg

    19/26

    In practice, we work with the P-value. For example, if k= 4 and

    LR= 7.8

    0 5 10 150

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    0.16

    0.18

    0.2density of chisquared distribution with 4 degrees of freedom

    Pvalue=0.092

    LR=7.8

  • 8/13/2019 logreg

    20/26

    Percentage correctly predicted

    This is a measure of predictive accuracy, and defined as

    1

    n

    n

    i=1

    I(yi= y

    i)

    This is an estimate of the error rate of the prediction rule.

    [This estimate is over optimistic, since based on the estimation

    sample. It is better to compute it using an out-of-sample prediction.]

  • 8/13/2019 logreg

    21/26

    6 Other Limited Dependent Variable

    Models

    Censored regression models

    Examples: Car expenditures, Income of Females, ... The value zero

    will often be observed.

    Mixture of a discrete (at 0) and continuous variable (Tobit models)

    Truncated regression models

    Data above of below a certain threshold are unobserved or censored.

    These data are not available. We have a selected sample.

  • 8/13/2019 logreg

    22/26

    Count data

    Examples: number of strikes in a firm, number of car accidents,

    number of children

    Poisson type models

    Multiple choice data

    Examples: mode of transport

    Multinomial logit/probit

    Ordered response data

    Examples: educational level, credit ratings (B/A/AA/...)

    Ordered probit

  • 8/13/2019 logreg

    23/26

    7 Exercise

    We will analyse the data in the file grade.wf1. We have a sample

    of students and we want to study the effect of the introduction of a

    new teaching method, called PSI. The dependent variable is GRADE,

    indicating whether students grade improved or not after the

    introduction of the new method. The explanatory variables are

    PSI: a binary variable indicating whether the student was

    exposed to the new teaching method or not.

    TUCE: the score on a pretest that indicates entering knowledge

    of the material to be taught.

    We will now run a LOGIT-regression of GRADE on a constant, PSI

    and TUCE.

  • 8/13/2019 logreg

    24/26

  • 8/13/2019 logreg

    25/26

    1. Why do we add TUCE to the regression model, if we are only interest in the

    effect of PSI?

    2. Interpret the estimated regression coefficients.

    3. Take a student with TUCE=20. (a) Estimate the probability that he will

    increase his grade if he follows the PSI-method. (b) What is this probability

    to increase his grade if he will not follow this PSI-method. (c) Will this

    student improve his grade, if PSI=1? (d) Compute the log odds-ratio (for

    improving the grade or not) for this student once for PSI=1 and once for

    PSI=0. Compute the difference between these two log-odds ratios. Compare

    with the regression coefficient of PSI.

    4. Compute the percentage of correctly classified observations and comment

    (you can use View/Expectation-Prediction table).

    5. The output shows the value LR statistic? How is this value computed?

    6. Run now a PROBIT regression. Is there much difference between the

    estimates? And for the percentage of correctly classified observations?

  • 8/13/2019 logreg

    26/26

    References:

    Greene, W.H., Econometric Analysis, 5th edition (2003) Prentice

    Hall.

    Stock, J.H., Watson, M.W., Introduction to Econometric, 2nd

    edition (2007) Pearson.