How to handle mixed-models with low sample size as well as low frequency of categorical level of interest?

Question

Overview
I am working on analyzing data from an observational study, details of which is as follows: 18 subjects with type 1 diabetes (T1D) + 14 healthy controls were recruited for a four-week long naturalistic driving study. Throughout the study period, subjects and their vehicles were constantly monitored to collect information on their blood glucose (BG) levels (only applies to T1D subjects) and their driving behavior. My part of the analysis is to focus on how acute (in-vehicle) glycemic episodes (details below) affect driver behavior at stop-controlled intersections.

Response variable (0/1): Stopping response is coded as 1 when a driver makes an unsafe stopping maneuver (rolling stop or no stop) and 0 when a driver makes a safe stopping maneuver (full stop)
Control variables: lead vehicle status (three levels: none present, present without effect, and present with effect), crossing vehicle status (three levels: none present, present without effect, and present with effect), and intersection type (minor-road-only or all-way stop)
Explanatory variable of interest: Glycemic episode of the driver during the stop sign encounters. Glycemic episode is computed from the blood glucose readings and is coded as a factor variable with four levels: hypo (BG is below a certain threshold for an extended period of time), hyper (BG is above a certain threshold for an extended period of time), normal (BG is within bounds), and control (for healthy controls). Clinical literature has shown that hyper episodes do not have a drastic effect on T1D patients in terms of cognition and other motor functions that are required for safe driving, but hypo episodes do (but they are infrequent). So the primary interest in the analysis is to see if hypo episodes in particular make drivers more risky at stop intersections.

Response variable and control variables were extracted through manual video review (watching video clips of drivers navigating through stop intersections). Also, whether a lead vehicle (the vehicle the subject is following in the same lane when approaching the intersection) or a crossing vehicle (the vehicle that crosses subject driver's path: think of North-Bound/West-Bound type situation at a 4-way stop) had an effect on subject driver's stopping behavior was based on human judgement. Data on glycemic episodes were made available to me.
Modeling approach
Since each subject gave multiple responses at stop-controlled intersections, I decided to use the mixed-effects logistic regression model using lme4::glmer() function in R. Additionally, I considered fitting models to three different subsets of the data (say subset1, subset2, and subset3): (a) stopping responses from all subjects (T1D + Control), (b) stopping responses from just T1D subjects, and (c) stopping responses from just T1D subjects who had both hypo and normal episodes. This data partitioning was done for overall data (say data1) that included cases where a lead vehicle or a crossing had an impact on subject driver's behavior and for the data obtained after omitting such cases (say data2). So, I tried fitting models to 6 variants of the data: data1-subset1, data1-subset2, ..., data2-subset3. Furthermore, I tested for both the subject and the intersection (same intersection was visited multiple times, partially-crossed) random effects terms in the model. So, overall I tried the following 12 models:
mod1 <- unsafe_response ~ glycemic_episode + control_variables + (1|subj), data = data1-subset1
mod2 <- unsafe_response ~ glycemic_episode + control_variables + (1|subj) + (1|intersection), data = data1-subset1
mod3 <- unsafe_response ~ glycemic_episode + control_variables + (1|subj), data = data1-subset2
mod4 <- unsafe_response ~ glycemic_episode + control_variables + (1|subj) + (1|intersection), data = data1-subset2
mod5 <- unsafe_response ~ glycemic_episode + control_variables + (1|subj), data = data1-subset3
mod6 <- unsafe_response ~ glycemic_episode + control_variables + (1|subj) + (1|intersection), data = data1-subset3

I repeated the same six model structure by removing the control variables and using data2-subset* data variants, e.g., unsafe_response ~ glycemic episode + (1|subj), data = data2-subset1.
Here's the descriptive summary statistics for the model data frames:
data1-subset*:

data2-subset*: (Notice records where lead vehicle and/or crossing vehicles had an effect on subject driver's behavior are omitted and control variables are no longer used when modeling using this data)

Problem
Unfortunately, none of the models that I tested resulted in glycemic episodes being a significant predictor in explaining the stopping responses. In my opinion, this may be due to the low sample size (I cannot increase it because it includes the entire data collection and these are all the stop sign encounters that I found) and low frequency of hypo glycemic episodes (can't increase this frequency either). What other modeling approaches can I use for this data besides mixed-effects modeling? Say after doing all this, the effects of glycemic levels did not turn out to be significant. Will this be a defensible analysis for a paper? (I have drafted the paper, but am not sure if I will face problems publishing it because the effects of interest are non-significant)
Note: It's a novel study and I haven't seen any paper where researchers connected clinical data and naturalistic driving data to model driver behavior. So, can these non-significant finding be reported as a pilot test results?

Robert Long · Answer

Statistically insignificant results are not a reason to not publish. Unfortunately many people make this mistake (including reviewers) and that is a big part of the reason for publication bias. Perhaps the effects sizes are interesting in themselves, but if not and if these hypoglycaemic episodes do not actually affect driving behaviour then that is a good thing for safety, is it not ? On the other hand if the study and analysis is severly under-powered then that is a serious problem.
So I think you can look at several things here.

Mixed effects models are a good approach to the repeated measures problem. From your explanation you need to include random intercepts for both participant and intersection.

Subsetting the data will cause a big reduction in statistical power.  I don't think you need to do this. Retain the variables that you used to split/subset on fixed effects.

Do you have access to the underlying glucose readings ? If include these rather than the categorized version, this will increase statistical power - and you can also allow for nonlinear effects.

A different approach to mixed effects model is a generalized estimating equations model (GEE). However I doubt there would be much difference in power and the above considerations still apply.

How to handle mixed-models with low sample size as well as low frequency of categorical level of interest?

Overview

Modeling approach

Problem

One Answer

Add your own answers!

Ask a Question