TransWikia.com

public Normally-distributed data for teaching intro Stats

Open Data Asked by philophilosophia on September 29, 2021

Background:
I’m teaching an intro course to stats, and this term, I have decided to use real-world public data sets to demonstrate the methods on, instead of synthetic data. I was surprised that I wouldn’t find basic data such as height/weight/IQ of men and women (which are famously well-approximated by Gaussian). I do find parameters (mean/variance of weight of Americans, for example), but I don’t want to synthesize a Gaussian based on parameters. Rather, I’m looking for actual data, so the students experience the noisy-ness of real data, and how approximations work. I have the same problem for finding non-Normal data, e.g., wealth distribution and other heavy-tailed ones. Parameters exist but I cannot find actual data sets.

TLDR:
For an introductory Stats course, I’m looking for publicly available data sets with medium-size sample sizes, i.e., $N=O(10^3)$ or $O(10^4)$. Preferably, with close-to-Gaussian distributions, but anything is useful.

One Answer

You can find the best publicly available datasets on Kaggle with kernels/notebooks for references. This is the best place to find the relevant data for your teaching. Need to signup to download the datasets

Answered by Pluviophile on September 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP