8/13/2019 logreg
1/26
Binary Choice Models1. Binary Dependent Variables
2. Probit and Logit Regression
3. Maximum Likelihood estimation
4. Estimation Binary Models in Eviews
5. Measures of Goodness of Fit
6. Other Limited Dependent Variable Models
7. Exercise
8/13/2019 logreg
2/26
1 Binary Dependent Variables
The variable of interest Y is binary. The two possible outcomes are
labeled as 0 and 1. We want to model Y as a function of explanatory
variables X= (X1, . . . , X p).
Example: Y=employed (1) or unemployed (0); X=educational level,
age, marital status, ...
Example: Y=expansion (1) or recession (0); X=unemployment level,
inflation, ...
8/13/2019 logreg
3/26
Can we still use linear regression?
Then
E[Y|X] =0+ 1X1+ . . . pXp
and the OLS fitted values are given by
Y = 0+ 1X1+ . . .pXp.
! Problem: the left hand side of the above equations takes values
between 0 and 1, while the right hand side may take any value on the
real line.
Note that
E[Y|X] = 0.P(Y = 0|X) + 1.P(Y = 1|X) =P(Y = 1|X)
The conditional expected values are conditional probabilities
8/13/2019 logreg
4/26
0 1 2 3 4 5 6 7 8 9 100.2
0
0.2
0.4
0.6
0.8
1
x
y
data cloud
Linear Fit
Sshaped fit
8/13/2019 logreg
5/26
2 Probit and Logit Regression
Binary regression model:
P(Y = 1|X) =F(0+ 1X1+ . . . pXp)
with
(a) F(u) = 1
1 + exp(u) Logit
(b) F(u) = (u), standard normal cumulative distribution function
Probit
(c) ...
0< F(u)< 1 and F increasing
8/13/2019 logreg
6/26
0 1 2 3 4 5 6 7 8 9 100.2
0
0.2
0.4
0.6
0.8
1
x
y
data cloud
Probit Logit
Difference is small;Probit function is steeper.
8/13/2019 logreg
7/26
8/13/2019 logreg
8/26
Prediction
For an observation xi= (xi1, . . . , xip) we predict the probability of
success as
P(Y = 1|X=xi) =F(0+ 1xi1+ . . .pxip).
Set yi= 1 if P(Y = 1|X=xi)>0.5 and zero otherwise.
(Other cut-off values than 0.5=50% are sometimes taken)
8/13/2019 logreg
9/26
3 Maximum Likelihood Estimation (MLE)
General principle: let L() be the likelihood or joint density of the
observations y1, . . . , yn, depending on an unknown parameter
L() =n
i=1
f(yi, )
(assumes independent observations)
Then the maximum likelihood estimator is the maximizing L():
=argmax
log L() =argmax
n
i=1
log f(yi, )
Denote Lmax =L().
8/13/2019 logreg
10/26
MLE for Bernoulli Variables
Let yi be the outcome of a 0/1 (failure/success) experiment, with p
be the probability of success. Then f(1, p) =p and f(0, p) = 1 p,
hence
f(yi, p) =pyi(1 p)1yi
The MLE p maximizes
n
i=1
{yilog(p) + (1 yi) log(1 p)}.
It is not difficult to check that p= 1
nn
i=1 yi, the percentage ofsuccesses in the sample.
8/13/2019 logreg
11/26
MLE for Probit Model
We will condition on the explanatory variables; hence keep them fixed.
f(yi, pi) =pyii (1 pi)
1yi with pi= (0+ 1Xi1+ . . . pXip)
The MLE = (0,1 . . . ,p) maximizes
n
i=1
{yilog (0+1Xi1+. . . pXip)+(1yi) log(1(0+1Xi1+. . . pXip))}.
The MLE needs to be computed using a numerical algorithm on thecomputer. (Similar for Logit model)
8/13/2019 logreg
12/26
If the model is correctly specified, then
1. MLE is consistent and asymptotically normal.
2. MLE is asymptotically the most precise estimator, hence
efficient.
3. Inference (testing, confidence intervals) can be done.
If the model is misspecified, then the MLE may loose the above
properties.
8/13/2019 logreg
13/26
8/13/2019 logreg
14/26
(1) We first regress deny on a constant, black and pi rat. In
Eviews, we specify within the equation specification, Estimation
Settings: Method: BINARY-Binary Choice and select logit.
8/13/2019 logreg
15/26
Both explanatory variables are highly significant. They have a
positive effect on the probability of deny, as expected. They are also
jointly highly significant (LR stat =152, P
8/13/2019 logreg
16/26
Predictive accuracy: (Expectation-prediction table)
88% is correctly classified, with a sensitivity of only 4.2% and a
specificity of 99.7%. The gain is only 0.25 percentage points w.r.t. a
majority forecast (i.e. all applications are accepted).
(2) Repeat the analysis, now with all predictor variables.
8/13/2019 logreg
17/26
5 Measures of Fit
Pseudo R-squared
Compare the value of the likelihood of the full model with an empty
model:
M(full): P(Y = 1|X) =F(0+ 1X1+ . . . + pXp)
M(empty): P(Y = 1|X) =F(0)
Pseudo R-squared=1 log Lmax(Full)
log Lmax(Empty)
(also called McFadden R-squared)
8/13/2019 logreg
18/26
Likelihood ratio test
The Likelihood Ratio (LR) statistic for H0 :1 =. . .= p = 0 is
LR= 2{log Lmax(Full) log Lmax(Empty)}
We reject H0 for large values of LR.
The LR statistics can be used to compare any pair of two nested
model. Suppose that M1 is a submodel of M2, and we want to test
H0 :M1 =M2. Then, under H0:
LR= 2{log Lmax(M2) log Lmax(M1)} d 2k,
where k is the number of restrictions (i.e. the difference in number of
parameters between M2 and M1).
8/13/2019 logreg
19/26
In practice, we work with the P-value. For example, if k= 4 and
LR= 7.8
0 5 10 150
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2density of chisquared distribution with 4 degrees of freedom
Pvalue=0.092
LR=7.8
8/13/2019 logreg
20/26
Percentage correctly predicted
This is a measure of predictive accuracy, and defined as
1
n
n
i=1
I(yi= y
i)
This is an estimate of the error rate of the prediction rule.
[This estimate is over optimistic, since based on the estimation
sample. It is better to compute it using an out-of-sample prediction.]
8/13/2019 logreg
21/26
6 Other Limited Dependent Variable
Models
Censored regression models
Examples: Car expenditures, Income of Females, ... The value zero
will often be observed.
Mixture of a discrete (at 0) and continuous variable (Tobit models)
Truncated regression models
Data above of below a certain threshold are unobserved or censored.
These data are not available. We have a selected sample.
8/13/2019 logreg
22/26
Count data
Examples: number of strikes in a firm, number of car accidents,
number of children
Poisson type models
Multiple choice data
Examples: mode of transport
Multinomial logit/probit
Ordered response data
Examples: educational level, credit ratings (B/A/AA/...)
Ordered probit
8/13/2019 logreg
23/26
7 Exercise
We will analyse the data in the file grade.wf1. We have a sample
of students and we want to study the effect of the introduction of a
new teaching method, called PSI. The dependent variable is GRADE,
indicating whether students grade improved or not after the
introduction of the new method. The explanatory variables are
PSI: a binary variable indicating whether the student was
exposed to the new teaching method or not.
TUCE: the score on a pretest that indicates entering knowledge
of the material to be taught.
We will now run a LOGIT-regression of GRADE on a constant, PSI
and TUCE.
8/13/2019 logreg
24/26
8/13/2019 logreg
25/26
1. Why do we add TUCE to the regression model, if we are only interest in the
effect of PSI?
2. Interpret the estimated regression coefficients.
3. Take a student with TUCE=20. (a) Estimate the probability that he will
increase his grade if he follows the PSI-method. (b) What is this probability
to increase his grade if he will not follow this PSI-method. (c) Will this
student improve his grade, if PSI=1? (d) Compute the log odds-ratio (for
improving the grade or not) for this student once for PSI=1 and once for
PSI=0. Compute the difference between these two log-odds ratios. Compare
with the regression coefficient of PSI.
4. Compute the percentage of correctly classified observations and comment
(you can use View/Expectation-Prediction table).
5. The output shows the value LR statistic? How is this value computed?
6. Run now a PROBIT regression. Is there much difference between the
estimates? And for the percentage of correctly classified observations?
8/13/2019 logreg
26/26
References:
Greene, W.H., Econometric Analysis, 5th edition (2003) Prentice
Hall.
Stock, J.H., Watson, M.W., Introduction to Econometric, 2nd
edition (2007) Pearson.