Including the interaction but not the main effects in a model

Question

Is it ever valid to include a two-way interaction in a model without including the main effects?  What if your hypothesis is only about the interaction, do you still need to include the main effects?

Ben · Answer

Various texts on regression will tell you that you should never include an interaction term without the base effects --- that is not correct.  Once circumstance where it is appropriate to include an interaction term in your model without a base effect is when you have nested variables in your model.  For example, if you have a regression problem with one indicator variable married and another variable spouse_age then you would use a model like this:
Response ~ married + married:spouse_age + other_vars

Note here that there is no base effect for spouse_age since it is only applicable if the person is married.  In the case where the latter indicator is zero, the value of spouse_age is a placeholder value (and indeed, it should probably be coded as NA).  If you were to include a base effect for the nested variable then this would bring the irrelevant placeholder value into the regression, which would give incorrect results
If you would like some additional information on the use of interaction terms for "nested" variables like this, see this related question.

Sol Hator · Answer

Is it ever valid to include a two-way interaction without main effect?

Yes it can be valid and even necessary. If for example in 2. you would
include a factor for main effect (average difference of blue vs red condition) this would make the model worse.

What if your hypothesis is only about the interaction, do you still need to include the main effects?

Your hypothesis might be true independent of there being a main effect. But the model might need it to best describe the underlying process.
So yes, you should try with and without.

Note: In the case of only interaction you need to center the code for the "continuous" independent variable (measurement in the example). Otherwise the interaction coefficients in the model will not be symmetrically distributed (no coefficient for the first measurement in the example).

Ketil B T · Answer

The short answer:
If you include interaction in the fixed effects, then the main effects are automatically included whether or not you specifically include them in your code. The only difference is your parametrization, i.e., what the parameters in your model mean (e.g., are they group means or are they differences from reference levels).

Assumptions:
I assume we are working in the general linear model and are asking when we can use the fixed effects specification $AB$ instead of $A + B + AB$, where $A$ and $B$ are (categorical) factors.

Mathematical clarification: 
We assume that the response vector $Y sim mathcal N(xi , sigma^2 I_n )$.
If $X_A$, $X_B$ and $X_{AB}$ are the design matrices for the three factors, then a model with "main effects and interaction" corresponds to the restriction $xi in$ span${X_A, X_B, X_{AB}}$.
A model with "only interaction" corresponds to the restriction $xi in$ span${X_{AB}}$. 
However, span${X_{AB}} =$ span${X_A, X_B, X_{AB}}$. So, it's two different parametrizations of the same model (or the same family of distributions if you  are more comfortable with that terminology).

I just saw that David Beede provided a very similar answer (apologies), but I thought I would leave this up for those who respond well to a linear algebra perspective.

gaborous · Answer

Yes this can be valid, although it is rare. But in this case you still need to model the main effects, which you will afterward regress out.

Indeed, in some models, only the interaction is interesting, such as drug testing/clinical models. This is for example the basis of the Generalized PsychoPhysiological Interactions (gPPI) model: y = ax + bxh + ch where x/y are voxels/regions of interest and h the block/events designs.

In this model, both a and c will be regressed out, only b will be kept for inference (the beta coefficients). Indeed, both a and c represent spurious activity in our case, and only b represents what cannot be explained by spurious activity, the interaction with the task.

David Beede · Answer

If the variables in question are categorical, then including interactions without the main effects is just a reparameterizations of the model, and the choice of parameterization depends on what you are trying to accomplish with your model. Interacting continuous variables with other continuous variables ore with categorical variables is a whole different story. See: see this faq from UCLA's Institute for Digital Research and Education

Answered by David Beede on December 3, 2021

nick michalak · Answer

F = m*a, force equals mass times acceleration.

It is not represented as F = m + a + ma, or some other linear combination of those parameters. Indeed, only the interaction between mass and acceleration would make sense physically.

Hans Landsheer · Answer

There are various processes in nature that involve only an interaction effect and laws that decribe them. For instance Ohm's law. In psychology you have for instance the performance model of Vroom (1964): Performance = Ability x Motivation.Now, you might expect finding an significant interaction effect when this law is true. Regretfully, this is not the case. You might easily end up with finding two main effects and an insignificant interaction effect (for a demonstration and further explanation see Landsheer, van den Wittenboer and Maassen (2006), Social Science Research 35, 274-294). The linear model is not very well suited for detecting interaction effects; Ohm might never have found his law when he had used linear models.

As a result, interpreting interaction effects in linear models is difficult. If you have a theory that predicts an interaction effect, you should include it even when insignificant. You may want to ignore main effects if your theory excludes those, but you will find that difficult, as significant main effects are often found in the case of a true data generating mechanism that has only a multiplicative effect.

My answer is: Yes, it can be valid to include a two-way interaction in a model without including the main effects. Linear models are excellent tools to approximate the outcomes of a large variety of data generating mechanisms, but their formula's can not be easily interpreted as a valid description of the data generating mechanism.

rolando2 · Answer

Both x and y will be correlated with xy (unless you have taken a specific measure to prevent this by using centering). Thus if you obtain a substantial interaction effect with your approach, it will likely amount to one or more main effects masquerading as an interaction. This is not going to produce clear, interpretable results. What is desirable is instead to see how much the interaction can explain over and above what the main effects do, by including x, y, and (preferably in a subsequent step) xy.

As to terminology: yes, β 0 is called the "constant." On the other hand, "partial" has specific meanings in regression and so I wouldn't use that term to describe your strategy here.

Some interesting examples that will arise once in a blue moon are described at this thread.

As to terminology:  yes, β 0 is called the "constant."  On the other hand, "partial" has specific meanings in regression and so I wouldn't use that term to describe your strategy here.

Some interesting examples that will arise once in a blue moon are described at this thread.

andrea · Answer

I will borrow a paragraph from the book An introduction to survival analysis using Stata by M.Cleves, R.Gutierrez, W.Gould, Y.Marchenko edited by Stata press to answer to your question.

It is common to read that interaction effects should be included in the model only when the corresponding main effects are also included, but there is nothing wrong with including interaction effects by themselves. [...] The goal of a researcher is to parametrize what is reasonably likely to be true for the data considering the problem at hand and not merely following a prescription.

Peter Flom · Answer

It is very rarely a good idea to include an interaction term without the main effects involved in it. David Rindskopf of CCNY has written some papers about those rare instances.

dmk38 · Answer

this is implicit in many of answers others have given but the simple point is that models w/ a product term but w/ & w/o the moderator & predictor are just different models. Figure out what each means given the process you are modeling and whether a model w/o the moderator & predictor makes more sense given your theory or hypothesis. The observation that the product term is significant but only when moderator & predictor are not included doesn't tell you anything (except maybe that you are fishing around for "significance") w/o a cogent explanation of why it makes sense to leave them out.

Answered by dmk38 on December 3, 2021

probabilityislogic · Answer

I would suggest it is simply a special case of model uncertainty.  From a Bayesian perspective, you simply treat this in exactly the same way you would treat any other kind of uncertainty, by either:

Calculating its probability, if it is the object of interest
Integrating or averaging it out, if it is not of interest, but may still affect your conclusions

This is exactly what people do when testing for "significant effects" by using t-quantiles instead of normal quantiles.  Because you have uncertainty about the "true noise level" you take this into account by using a more spread out distribution in testing.  So from your perspective the "main effect" is actually a "nuisance parameter" in relation to the question that you are asking.  So you simply average out the two cases (or more generally, over the models you are considering).  So I would have the (vague) hypothesis:
$$newcommand{int}{mathrm{int}}H_{int}:text{The interaction between A and B is significant}$$
I would say that although not precisely defined, this is the question you want to answer here.  And note that it is not the verbal statements such as above which "define" the hypothesis, but the mathematical equations as well.  We have some data $D$, and prior information $I$, then we simply calculate:
$$P(H_{int}|DI)=P(H_{int}|I)frac{P(D|H_{int}I)}{P(D|I)}$$
(small note: no matter how many times I write out this equation, it always helps me understand the problem better. weird).  The main quantity to calculate is the likelihood $P(D|H_{int}I)$,  this makes no reference to the model, so the model must have been removed using the law of total probability:
$$P(D|H_{int}I)=sum_{m=1}^{N_{M}}P(DM_{m}|H_{int}I)=sum_{m=1}^{N_{M}}P(M_{m}|H_{int}I)P(D|M_{m}H_{int}I)$$
Where $M_{m}$ indexes the mth model, and $N_{M}$ is the number of models being considered.  The first term is the "model weight" which says how much the data and prior information support the mth model.  The second term indicates how much the mth model supports the hypothesis.  Plugging this equation back into the original Bayes theorem gives:
$$P(H_{int}|DI)=frac{P(H_{int}|I)}{P(D|I)}sum_{m=1}^{N_{M}}P(M_{m}|H_{int}I)P(D|M_{m}H_{int}I)$$
$$=frac{1}{P(D|I)}sum_{m=1}^{N_{M}}P(DM_{m}|I)frac{P(M_{m}H_{int}D|I)}{P(DM_{m}|I)}=sum_{m=1}^{N_{M}}P(M_{m}|DI)P(H_{int}|DM_{m}I)$$

And you can see from this that $P(H_{int}|DM_{m}I)$ is the "conditional conclusion" of the hypothesis under the mth model (this is usually all that is considered, for a chosen "best" model).  Note that this standard analysis is justified whenever $P(M_{m}|DI)approx 1$ - an "obviously best" model - or whenever $P(H_{int}|DM_{j}I)approx P(H_{int}|DM_{k}I)$ - all models give the same/similar conclusions.  However if neither are met, then Bayes' Theorem says the best procedure is to average out the results, placing higher weights on the models which are most supported by the data and prior information.

Frank Harrell · Answer

In my experience, not only is it necessary to have all lower order effects in the model when they are connected to higher order effects, but it is also important to properly model (e.g., allowing to be nonlinear) main effects that are seemingly unrelated to the factors in the interactions of interest. That's because interactions between $x_1$ and $x_2$ can be stand-ins for main effects of $x_3$ and $x_4$. Interactions sometimes seem to be needed because they are collinear with omitted variables or omitted nonlinear (e.g., spline) terms.

Answered by Frank Harrell on December 3, 2021

Wolfgang · Answer

While it is often stated in textbooks that one should never include an interaction in a model without the corresponding main effects, there are certainly examples where this would make perfect sense. I'll give you the simplest example I can imagine.

Suppose subjects randomly assigned to two groups are measured twice, once at baseline (i.e., right after the randomization) and once after group T received some kind of treatment, while group C did not. Then a repeated-measures model for these data would include a main effect for measurement occasion (a dummy variable that is 0 for baseline and 1 for the follow-up) and an interaction term between the group dummy (0 for C, 1 for T) and the time dummy.

The model intercept then estimates the average score of the subjects at baseline (regardless of the group they are in). The coefficient for the measurement occasion dummy indicates the change in the control group between baseline and the follow-up. And the coefficient for the interaction term indicates how much bigger/smaller the change was in the treatment group compared to the control group.

Here, it is not necessary to include the main effect for group, because at baseline, the groups are equivalent by definition due to the randomization.

One could of course argue that the main effect for group should still be included, so that, in case the randomization failed, this will be revealed by the analysis. However, that is equivalent to testing the baseline means of the two groups against each other. And there are plenty of people who frown upon testing for baseline differences in randomized studies (of course, there are also plenty who find it useful, but this is another issue).

whuber · Answer

You ask whether it's ever valid.  Let me provide a common example, whose elucidation may suggest additional analytical approaches for you.

The simplest example of an interaction is a model with one dependent variable $Z$ and two independent variables $X$, $Y$ in the form

$$Z = alpha + beta' X + gamma' Y + delta' X Y + varepsilon,$$

with $varepsilon$ a random term variable having zero expectation, and using parameters $alpha, beta', gamma',$ and $delta'$.  It's often worthwhile checking whether $delta'$ approximates $beta' gamma'$, because an algebraically equivalent expression of the same model is

$$Z = alpha left(1 + beta X + gamma Y + delta X Y right) + varepsilon$$

$$= alpha left(1 + beta X right) left(1 + gamma Y right) + alpha left( delta - beta gamma right) X Y + varepsilon$$

(where $beta' = alpha beta$, etc).

Whence, if there's a reason to suppose $left( delta - beta gamma right) sim 0$, we can absorb it in the error term $varepsilon$.  Not only does this give a "pure interaction", it does so without a constant term.  This in turn strongly suggests taking logarithms.  Some heteroscedasticity in the residuals--that is, a tendency for residuals associated with larger values of $Z$ to be larger in absolute value than average--would also point in this direction.  We would then want to explore an alternative formulation

$$log(Z) = log(alpha) + log(1 + beta X) + log(1 + gamma Y) + tau$$

with iid random error $tau$.  Furthermore, if we expect $beta X$ and $gamma Y$ to be large compared to $1$, we would instead just propose the model

$$log(Z) = left(log(alpha) + log(beta) + log(gamma)right) + log(X) + log(Y) + tau$$

$$= eta + log(X) + log(Y) + tau.$$

This new model has just a single parameter $eta$ instead of four parameters ($alpha$, $beta'$, etc.) subject to a quadratic relation ($delta' = beta' gamma'$), a considerable simplification.

I am not saying that this is a necessary or even the only step to take, but I am suggesting that this kind of algebraic rearrangement of the model is usually worth considering whenever interactions alone appear to be significant.

Some excellent ways to explore models with interaction, especially with just two and three independent variables, appear in chapters 10 - 13 of Tukey's EDA.

ayush biyani · Answer

This one is tricky and happened to me in my last project. I would explain it this way: lets say you had variables A and B which came out significant independently and by a business sense you thought that an interaction of A and B seems good. You included the interaction which came out to be significant but B lost its significance. You would explain your model initially by showing two results. The results would show that initially B was significant but when seen in light of A it lost its sheen. So B is a good variable but only when seen in light of various levels of A (if A is a categorical variable). Its like saying Obama is a good leader when seen in the light of its SEAL army. So Obama*seal will be a significant variable. But Obama when seen alone might not be as important. (No offense to Obama, just an example.)

Answered by ayush biyani on December 3, 2021

Galit Shmueli · Answer

The reason to keep the main effects in the model is for identifiability. Hence, if the purpose is statistical inference about each of the effects, you should keep the main effects in the model. However, if your modeling purpose is solely to predict new values, then it is perfectly legitimate to include only the interaction if that improves predictive accuracy.

Answered by Galit Shmueli on December 3, 2021

Michael Bishop · Answer

Arguably, it depends on what you're using your model for. But I've never seen a reason not to run and describe models with main effects, even in cases where the hypothesis is only about the interaction.

Answered by Michael Bishop on December 3, 2021

Including the interaction but not the main effects in a model

18 Answers

Add your own answers!

Ask a Question