What Statistical principles are being violated by comparing specific Trainer Fatality Rates to Race Track Fatality rates?

Question

A Hall of Fame Trainer of Thoroughbred Racehorses has been banned from a Race Track because 7 horses under his care have have been injured while training or racing in the past year.

Critics cite a Nation-wide Fatality Rate of 1.6 Fatalities per Thousand Entries/Starts in Races: https://thorocap.com/2019/12/01/hollendorfer-again-in-the-midst-of-controversy/?fbclid=IwAR3iSUkqz2SDKLBmGORAsQGdmhJNRnsbooakrk-yBbUf-zOzEFL58RwUCM0

They claim that the incidence of Fatalities associated with his horses is "far above average."

I claim that it is inappropriate and scientifically invalid to compare the incident rate of fatalities for one specific trainer with the Base Rate per Thousand Entries aggregated for ALL Race Tracks across the country.

I am looking for reasons why it is a flawed approach.

So far, it seems to me that looking at the Fatalities Per Thousand Races is more like Sampling with Replacement, while the Rate of Fatalities per Specific Trainer is more like Sampling Without Replacement.

The trainer in question, for example, has had only 477 Starts in 2019, and many of those involve multiple starts by the same horses, who are given multiple  Training Runs  and races during their campaigns.

He thus has had nowhere near the Thousand Entry/Start benchmark derived for ALL Racetracks...so - in a sense the Rates of Sampling are not equivalent.

Moreover - if you look at the Time Span of 1000 Entries at any major track - that would be equivalent to 10 days of racing with 10 races per day with 10 horses in each race.

None of those horses has had to run more than once in that time span, so therefore their risk of injury on a single visit to the racetrack can be expected to be lower than a Stable of Horses brought repeatedly to train and race at a track over and over again.

Because of this, it is highly likely that there would be a 10 day span (1,000 Entries) without a single injury quite frequently.

However, each time a specific trainer takes his horses out on the track, there is a repeated risk of injury which may be cumulative in nature.

For this reason, the data derived from these horses has more in common with Sampling Without Replacement, in that the odds of something occurring would increase proportionately with the number of "draws".

Additional problems would be that there is no discussion of the Distribution and Range of the Fatal Event data.

It is highly unlikely that the data follow a Normal Distribution, and Heteroskedasticity ought to complicate the assessment of what is a "Normal" rate of Fatalities...and how many fatalities should be considered "Far higher than average."

The majority of Trainers are smaller "mom and pop" operations, where they have only a handful of horses...therefor the chances of a fatal injury in stables where the trainers have a much more "Hands-on" intimate and direct connection with each horse ought to be lower than in a Stable with 100 horses, where the daily care and interaction is left to low-paid grooms and assistants.

What other flaws are associated with comparing the number of fatal injuries associated with a Single Barn to Nation Wide aggregate fatality rates per thousand Entries/Starts?

Statistically speaking, should it be much easier for a Race Track to go through One Thousand Races without a single Fatality than for the Same Trainer to go through One Thousand Races without his horses having a single injury associated with Fatality?

What Statistical principles are being violated by comparing specific Trainer Fatality Rates to Race Track Fatality rates?

Add your own answers!

Ask a Question