TransWikia.com

Length of questionnaire for long-string analysis

Psychology & Neuroscience Asked by Mary Smirnova on October 5, 2021

I have a dataset that I am trying to screen for careless responding. I am having a little trouble finding research on what is the minimum length that a questionnaire needs to be in order to do long string analysis. Huang et al. (2012) suggest using a cut-off of people who answer the same on equal or greater than half the length of the scale. This seems problematic for shorter scales, for example, if someone answers 3 answer the same on a 6 item scale, they would be flagged. Particularly if there are no reversed items in the scale. Curran (2016) mentions that a 20 item scale might be problematic because it is too short, but the paper they cite doesn’t explain this or what would determine a scale that is too short.

I was wondering if someone might have some insight on this or some papers I should refer to. Thank you for reading!

One Answer

Here are a few general principles that I would apply:

Does the scale have a mix of positively and negatively worded items? If so, you'd need fewer items to identify strange cases. If it is all positively worded, then it's possible that someone could for example agree to a fairly large number of items.

Does it measure multiple constructs or does it measure a single dimension? If it measures more than one dimension (e.g., like a five factor personality test), then you'd expect to see more diversity in responses.

From first principles, it can be helpful to consider whether there is any conceivable reason to provide such a sequence of responses that would reflect conscientious survey completion.

You may also want to think about your survey design and the incentive structure to consider what incentives there are to respond in given way. Some survey designs make that mode of responding more efficient.

Another general strategy is to calculate an index and plot the distribution of cases on that index. If you see a break in the histogram, this can suggest that a discrete process (i.e., non-conscientious responding) has caused the data on the other side of the break. This can be helpful when looking at things like survey completion times. In the case of providing only one response option to a set of items, you can obtain a count of the number of response options and tabulate and look to see whether the vast majority provide all the responses. Alternatively, if it looks like a common response type, then you'd be less likely to conclude that it is problematic.

Here is one way to get this in R assuming data is your dataset and items is a vector item names.

data$unique_responses <- apply(data[,items], 1, function(X) length(unique(X))
table(data$unique_responses)

Answered by Jeromy Anglim on October 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP