Display Cluster Labels For A Scipy Dendrogram
I'm using hierarchical clustering to cluster word vectors, and I want the user to be able to display a dendrogram showing the clusters. However, since there can be thousands of wor
Solution 1:
You are correct about using the leaf_label_func parameter.
In addition to creating a plot, the dendrogram function returns a dictionary (they call it R in the docs) containing several lists. The leaf_label_func you create must take in a value from R["leaves"] and return the desired label. The easiest way to set labels is to run dendrogram twice. Once with no_plot=True
to get the dictionary used to create your label map. And then again to create the plot.
randomMatrix = np.random.uniform(-10,10,size=(20,3))
linked = linkage(randomMatrix, 'ward')
labels = ["A", "B", "C", "D"]
p = len(labels)
plt.figure(figsize=(8,4))
plt.title('Hierarchical Clustering Dendrogram (truncated)', fontsize=20)
plt.xlabel('Look at my fancy labels!', fontsize=16)
plt.ylabel('distance', fontsize=16)
# call dendrogram to get the returned dictionary # (plotting parameters can be ignored at this point)
R = dendrogram(
linked,
truncate_mode='lastp', # show only the last p merged clusters
p=p, # show only the last p merged clusters
no_plot=True,
)
print("values passed to leaf_label_func\nleaves : ", R["leaves"])
# create a label dictionary
temp = {R["leaves"][ii]: labels[ii] for ii inrange(len(R["leaves"]))}
defllf(xx):
return"{} - custom label!".format(temp[xx])
## This version gives you your label AND the count# temp = {R["leaves"][ii]:(labels[ii], R["ivl"][ii]) for ii in range(len(R["leaves"]))}# def llf(xx):# return "{} - {}".format(*temp[xx])
dendrogram(
linked,
truncate_mode='lastp', # show only the last p merged clusters
p=p, # show only the last p merged clusters
leaf_label_func=llf,
leaf_rotation=60.,
leaf_font_size=12.,
show_contracted=True, # to get a distribution impression in truncated branches
)
plt.show()
Solution 2:
you can simply write:
hierarchy.dendrogram(Z, labels=label_list)
Here is a good example, using pandas Data Frame :
import numpy as np
import pandas as pd
from scipy.cluster import hierarchy
import matplotlib.pyplot as plt
data = [[24, 16], [13, 4], [24, 11], [34, 18], [41,
6], [35, 13]]
frame = pd.DataFrame(np.array(data), columns=["Rape",
"Murder"], index=["Atlanta", "Boston", "Chicago",
"Dallas", "Denver", "Detroit"])
Z = hierarchy.linkage(frame, 'single')
plt.figure()
dn = hierarchy.dendrogram(Z, labels=frame.index)
Solution 3:
Seems to me that @coradek answer have a little mistake, though it was very helpful
I used his code (with df as pandas DataFrame) with correction:
plt.figure(figsize=(20,10))
labelList = df.apply(lambda x: f"{x['...']}",axis=1)
Z = linkage(df[["..."]])
R = dendrogram(Z,no_plot=True)
labelDict = {leaf: labelList[leaf] for leaf in R["leaves"]}
dendrogram(Z,leaf_label_func=lambda x:labelDict[x])
plt.show()
because the code presented above always gave me the same order of ticks
Post a Comment for "Display Cluster Labels For A Scipy Dendrogram"