TransWikia.com

Different representations of dendrograms

Data Science Asked by Noppawee Apichonpongpan on May 3, 2021

I have a dendrogram represented in a format I don’t understand:

(K_5:1.000030e+00,((K_1:2.000000e-05,(K_2:1.000000e-05,K_3:1.000000e-05):1.000000e-05):1.000000e-05,K_4:3.000000e-05)0.806:1.000000e+00):0.000000e+00;

I am not sure how to interpret the above.
It is an output of hierarchical clustering.

K_1, K_2, K_3, K_4, K_5 are the data points.

I have other dendrograms represented in the following format:

[x_1,x_2,x_3,x_4,x_5] (we start with one big cluster and split a cluster at each step)

[x_1,x_2][x_3,x_4,x_5]

[x_1,x_2][x_3,x_5][x_4]

[x_1][x_2][x_3,x_5][x_4]

[x_1][x_2][x_3][x_5][x_4]

I want a way to convert between these two representations.

One Answer

This output represents the dendogram as a tree. The innermost parentheses represent the deepest parts of the tree. For instance the top (root) of the tree start with the pair K5 and a subtree, then this subtree is made of another subtree and K4, and so on.

If we ignore the numerical values (distances I assume?) we have this:

(K_5,
  (
    (K_1,
       (
         K_2,K_3
       )
       
    )
    K_4
  )
)

Which represents this tree:

 --------------------
 |                  |
 |            -------------
K_5           |           |
           -------       K_4
           |     |
          K_1  -----
               |   |
              K_2 K_3

Then it can be converted to the desired format:

[K_1 , K_2 , K_3 , K_4 , K_5]
[K_1 , K_2 , K_3 , K_4] [K_5]
[K_1 , K_2 , K_3] [K_4] [K_5]
[K_1] [K_2 , K_3] [K_4] [K_5]
[K_1] [K_2] [K_3] [K_4] [K_5]

Correct answer by Erwan on May 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP