The logic
Most images are an RGB array where we can easily apply K-Means Clustering. The Centers of each cluster would be the most dominant colors of the image
Load the Image
We will load the image by using the matplotlib.image
and then we will create a Pandas
Data Frame of three columns, Red, Green Blue by iterating over image pixels.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import matplotlib.image as img from scipy.cluster.vq import kmeans, vq %matplotlib inline image =img.imread("landscape_1.jpeg") image.shape
(700, 1050, 3)
plt.imshow(image)
r = [] g = [] b = [] for row in image: for pixel in row: # A pixel contains RGB values r.append(pixel[0]) g.append(pixel[1]) b.append(pixel[2]) df = pd.DataFrame({'red':r, 'green':g, 'blue':b}) df.head()
red | green | blue | |
---|---|---|---|
0 | 119 | 146 | 163 |
1 | 119 | 146 | 163 |
2 | 119 | 146 | 163 |
3 | 120 | 147 | 164 |
4 | 120 | 147 | 164 |
Elbow Method
Now we will apply the Elbow Method to find the number of K clusters.
distortions = [] num_clusters = range(1, 7) # Create a list of distortions from the kmeans function for i in num_clusters: cluster_centers, distortion = kmeans(df[['red','green','blue']].values.astype(float), i) distortions.append(distortion) # Create a data frame with two lists - num_clusters, distortions elbow_plot = pd.DataFrame({'num_clusters': num_clusters, 'distortions': distortions}) # Creat a line plot of num_clusters and distortions sns.lineplot(x='num_clusters', y='distortions', data = elbow_plot) plt.xticks(num_clusters) plt.show()
As we can see from the plot, the number of clusters K=2
K-Means and Dominant Colors
The dominant colors are the cluster centers. Let’s get them:
cluster_centers, _ = kmeans(df[['red','green','blue']].values.astype(float), 2) cluster_centers
array([[173.85432863, 136.00392373, 79.91006256],
[223.40796294, 224.50774782, 224.42501677]])
We need to reshape them to \(1\times k \times 3\) where k is the number of clusters.
plt.imshow(cluster_centers.reshape(1,2,3)/255.)
As we can see, we get the dominant colors with few lines of code!
NB: In case we want to get the cluster labels we apply the vq
function with inputs the data that we want to get the cluster labels and the cluster centers.
# Assign cluster labels df['clusters'] = vq(df, cluster_centers)[0]
Want to learn more?
If you found this post helpful, you can have a look at other related posts:
2 thoughts on “Get the Dominant Colors of an Image with K-Means”
Thank you
HI, Thanks for the awesome explanation, i got bit struck with error on some images,
cluster_centers, _ = kmeans(df1.values.astype(float), cluster_size)
File “/home/roopesh/anaconda3/envs/newPython3.8/lib/python3.8/site-packages/scipy/cluster/vq.py”, line 480, in kmeans
guess = _kpoints(obs, k, rng)
File “/home/roopesh/anaconda3/envs/newPython3.8/lib/python3.8/site-packages/scipy/cluster/vq.py”, line 508, in _kpoints
idx = rng.choice(data.shape[0], size=k, replace=False)
File “mtrand.pyx”, line 954, in numpy.random.mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when ‘replace=False’
Got this error, can you help me out ,Many Thanks