TransWikia.com

Understanding the Stata margins command on dichotomous variables

Cross Validated Asked on November 12, 2021

If I run the margins command on a dichotomous variable, what does the output tell me exactly? On a continuous variable I understand that it tells me the average value for a given category but if I run on a dichotomous outcome variable and a categorical independent variable, what does it tell me?

Is there a cut-off at 0.5 so if it’s 0.25 that means that the average value at that level of the categorical variable is closer to 0 than 1, so the result (if significant) says that the effect is significantly lower?

As an example say I’m looking at cancer sizes and seizures. Cancer sizes is a 4 level categorical ordinal variable.

I run a logistic regression for seizure no/yes and cancer sizes. I then run the margins command and see that size 2 has a "margin" of 0.25. Does that mean that people are more likely to NOT (no being 0) experience seizures at this level?

And also how is this different than running a logistic regression on each level of the categorical variable dichotomized to dummy variables?

One Answer

It is hard to answer this precisely without seeing what you actually typed in Stata (both your logit specification and your margins command, and do note the correct spelling).

From the verbal description, it sounds like you are

  1. taking the sample used in the estimation of the logit,
  2. predicting Pr(Seizure) as if everyone in that sample had a cancer values of 1, 2, 3, and then 4 (instead of their actual observed values),
  3. using the logit coefficients from a model where cancer size is broken up into 3 dummy variables and an intercept.

This model says that you can expect 1 in 4 people with a cancer size of 2 to have a seizure. The predictions are on on a scale of [0,1], so 1 in 4 is 0.25.

Here's an reproducible example demonstrating this calculation, where we will model probability of a low weight birth given quartile of mother's age:

. webuse lbw, clear
(Hosmer & Lemeshow data)

. xtile age_qrt = age, nq(4)

. table age_qrt, c(min age max age)

----------------------------------
4         |
quantiles |
of age    |   min(age)    max(age)
----------+-----------------------
        1 |         14          19
        2 |         20          23
        3 |         24          26
        4 |         27          45
----------------------------------

. logit low i.age_qrt, nolog

Logistic regression                             Number of obs     =        189
                                                LR chi2(3)        =       5.50
                                                Prob > chi2       =     0.1383
Log likelihood = -114.58352                     Pseudo R2         =     0.0235

------------------------------------------------------------------------------
         low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     age_qrt |
          2  |   .2876821   .4149967     0.69   0.488    -.5256964    1.101061
          3  |   .5389965     .45687     1.18   0.238    -.3564522    1.434445
          4  |  -.5382246   .4822682    -1.12   0.264    -1.483453    .4070036
             |
       _cons |  -.8754687   .3073181    -2.85   0.004    -1.477801   -.2731362
------------------------------------------------------------------------------

. margins age_qrt

Adjusted predictions                            Number of obs     =        189
Model VCE    : OIM

Expression   : Pr(low), predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     age_qrt |
          1  |   .2941176   .0638031     4.61   0.000     .1690659    .4191694
          2  |   .3571429   .0640301     5.58   0.000     .2316462    .4826396
          3  |   .4166667   .0821678     5.07   0.000     .2556208    .5777125
          4  |   .1956522   .0584905     3.35   0.001     .0810129    .3102915
------------------------------------------------------------------------------

. /* margins by by hand */
. forvalues v=1/4 {
  2.         replace age_qrt=`v'
  3.         predict double phat`v', pr
  4. }
(138 real changes made)
(189 real changes made)
(189 real changes made)
(189 real changes made)

. sum phat*

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       phat1 |        189    .2941176           0   .2941176   .2941176
       phat2 |        189    .3571429           0   .3571429   .3571429
       phat3 |        189    .4166667    5.57e-17   .4166667   .4166667
       phat4 |        189    .1956522    2.78e-17   .1956522   .1956522

Here the highest risk group of the third age quartile, though the differences are probably not significant. Stata is calculating

$$AM_k =sum_{i=1}^N left[ hat p(x=k) right].$$

What you are describing sounds more like marginal effects, which involve comparing how probabilities change as you alter cancer size. These can be calculated like this:

margins, dydx(age_qrt)
/* margins, dydx() by by hand */
replace age_qrt = 1
predict phat1
replace age_qrt = 3
predict phat3
gen finite_diff3vs1 = phat3 - phat1
sum phat3 phat1 finite_diff3vs1

The output of the first command is:

. margins, dydx(age_qrt)

Conditional marginal effects                    Number of obs     =        189
Model VCE    : OIM

Expression   : Pr(low), predict()
dy/dx w.r.t. : 2.age_qrt 3.age_qrt 4.age_qrt

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     age_qrt |
          2  |   .0630252   .0903919     0.70   0.486    -.1141396      .24019
          3  |    .122549   .1040306     1.18   0.239    -.0813473    .3264453
          4  |  -.0984655   .0865562    -1.14   0.255    -.2681125    .0711815
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

This says that the expected change in probability associated with going from the lowest to the second highest age quartile (from 1 to 3 of 4) is 0.122549, so giving birth to a low weight baby is becomes more somewhat likely. This is 12 percentage point increase, which is a 42% increase. The note explains that this is a finite difference, and not really a derivative:

$$AME_k =sum_{i=1}^N left[ hat p(x=k)-hat p(x=baseline) right],$$

where $hat p(.)$ is the predicted probability from the logit model.

Answered by dimitriy on November 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP