Many times there is a need to define your distance function. I found this answer in StackOverflow very helpful and for that reason, I posted here as a tip.
All of the SciPy
hierarchical clustering routines will accept a custom distance function that accepts two 1D vectors specifying a pair of points and returns a scalar. For example, using fclusterdata
:
import numpy as np from scipy.cluster.hierarchy import fclusterdata # a custom function that just computes Euclidean distance def mydist(p1, p2): diff = p1 - p2 return np.vdot(diff, diff) ** 0.5 X = np.random.randn(100, 2) fclust1 = fclusterdata(X, 1.0, metric=mydist) fclust2 = fclusterdata(X, 1.0, metric='euclidean') print(np.allclose(fclust1, fclust2)) # True
Valid inputs for the metric=
kwarg are the same as for scipy.spatial.distance.pdist. Also here you can find some other info