TransWikia.com

Choice between dummy variables and Likert scale in Linear Regression

Economics Asked on May 17, 2021

I want to run a linear regression based on the data gathered using a questionnaire. Several of the questions have the following form:

How much do you spend on xyz in a month?

a. Less than $50

b. $50 to less than $100

c. $100 to less than $200

d. $200 or more

This is ordinal data, with multiple categories. I am not sure how to code them.

Should I use code them like a Likert scale (though Likert scale data is interval data instead):

0 for a, 1 for b, 2 for c and 3 for d

Or should I use dummy variables like this:

3 dummy variables for options b, c and d, with the respective dummy variable set equal to 1 if the option is chosen and 0 otherwise. All dummy variables are set to 0 if option a is chosen.

In this case I am concerned that all sense of ordinality is lost and I am treating the options just as nominal data. In this case, I think I would not be able to comment on whether the more you spend on xyz, the more you gain weight for example.

Which one should be used? Or should some completely different coding be used?

I seek to use such variables only as regressors, my regressand is a ratio-scale variable.

One Answer

In your question you talk about coding so I will address primarily the question of coding.

You should definitely code it using suggested

0 for a, 1 for b, 2 for c and 3 for d

regardless of whether you plan to use it as ordinal variable or categorical variable.

Let me explain, nowadays virtually any language or program will have option to easily turn ordinal, or categorical variables into dummies with virtually zero effort.

In R you can simply use is.factor() function to turn any ordinal variable into separate dummies so you could use:

lm( dep_var ~ ind_var + is.factor(variable), data = XY)

In Stata you can use i.variable to turn any ordinal or categorical variable into dummies, again in Stata you would run something like:

reg depvar independentvar i.variable

Next, coding it as separate dummies is more time consuming (even if just marginally it takes more line of code) than just coding it into one variable.

Consequently, no matter whether in actual model you want to include it as oridinal variable or set of dummies, in most standard languages and programs it will be more efficient (i.e. take less time to code) if you just code it on Likert scale.

As for whether what you should do from scientific standpoint that depends on details of your research. I can imagine you would be being able to defend both including the variable as independent ordinal scale or as a separate dummies. Both of those would work in principle. My sense is that you would likely want to include it as dummies since that will have better interpretation, but if you have little data and too many regressors then it might be worth while to just have it as ordinal variable.

Answered by 1muflon1 on May 17, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP