Predictive Hacks

Get the Dominant Colors of an Image with K-Means

dominant_color

The logic

Most images are an RGB array where we can easily apply K-Means Clustering. The Centers of each cluster would be the most dominant colors of the image


Load the Image

We will load the image by using the matplotlib.image and then we will create a Pandas Data Frame of three columns, Red, Green Blue by iterating over image pixels.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.image as img
from scipy.cluster.vq import kmeans, vq
%matplotlib inline 

image =img.imread("landscape_1.jpeg")

image.shape
 
(700, 1050, 3)
plt.imshow(image)
Get the Dominant Colors of an Image with K-Means 1

r = []
g = []
b = []

for row in image:
    for pixel in row:
        # A pixel contains RGB values
        r.append(pixel[0])
        g.append(pixel[1])
        b.append(pixel[2])

df = pd.DataFrame({'red':r, 'green':g, 'blue':b})

df.head()
 
redgreenblue
0119146163
1119146163
2119146163
3120147164
4120147164


Elbow Method

Now we will apply the Elbow Method to find the number of K clusters.

distortions = []
num_clusters = range(1, 7)

# Create a list of distortions from the kmeans function
for i in num_clusters:
    cluster_centers, distortion = kmeans(df[['red','green','blue']].values.astype(float), i)
    distortions.append(distortion)

# Create a data frame with two lists - num_clusters, distortions
elbow_plot = pd.DataFrame({'num_clusters': num_clusters, 'distortions': distortions})

# Creat a line plot of num_clusters and distortions
sns.lineplot(x='num_clusters', y='distortions', data = elbow_plot)
plt.xticks(num_clusters)
plt.show()
 
Get the Dominant Colors of an Image with K-Means 2

As we can see from the plot, the number of clusters K=2


K-Means and Dominant Colors

The dominant colors are the cluster centers. Let’s get them:

cluster_centers, _ = kmeans(df[['red','green','blue']].values.astype(float), 2)
cluster_centers
 
array([[173.85432863, 136.00392373,  79.91006256],
       [223.40796294, 224.50774782, 224.42501677]])

We need to reshape them to \(1\times k \times 3\) where k is the number of clusters.

plt.imshow(cluster_centers.reshape(1,2,3)/255.)
 
Get the Dominant Colors of an Image with K-Means 3
Get the Dominant Colors of an Image with K-Means 1

As we can see, we get the dominant colors with few lines of code!

NB: In case we want to get the cluster labels we apply the vq function with inputs the data that we want to get the cluster labels and the cluster centers.

# Assign cluster labels
df['clusters'] = vq(df, cluster_centers)[0]
 

Want to learn more?

If you found this post helpful, you can have a look at other related posts:

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

1 thought on “Get the Dominant Colors of an Image with K-Means”

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

connect with sql
R

How to Connect R with SQL

Need to Connect R with SQL It is common for Data Analysts/Scientists to connect R with SQL. For that reason,

[the_ad_group id="232"]
[the_ad id="2133"]