TransWikia.com

Is my data appropriate for Hierarchical Clustering?

Data Science Asked on January 14, 2021

I am a newbie in clustering and trying to check whether there are differences in Symptoms (example: cough, sneezing, shortness of breath, etc) reported across different comorbidity groups ( obesity, asthma, etc).

My data is in two formats:

  • original one – with 53.000 rows
  • and aggregate one obtained from the 53.000 k rows data – frequency and percentages of symptoms across comorbidities – which leads to an aggregate data of 9 rows of comorbidities, and 17 rows of symptoms (I have chosen the percentage instead of counts).

I have already done some hierarchical clustering on the aggregate level – yet I am wondering whether this is okay? I have chosen to do hierarchical clustering on 5000 rows only, but very computational expensive.

Thus in summary my questions are:

  1. is aggregate level data (2nd type of data outlined above) okay for clustering? I have actually 4 clusters of comorbidities given by aglomerative one. And hope the choice here is good one.

  2. as I have done some clustering on original data, not sure why this is so slow when run on 10.000 let alone 53.000. I have therefore decreased it to 5,000 as this was the best choice for me. is this behaviour on 53.000 rows okay?

Thank you in advance for your help.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP