Predictive Hacks

How to Generate Image Captions with Rekognition and OpenAI

Recently, text-to-image models such as the DALL-E and the Stable Diffusion got the attention of the hi-tech community. These deep learning models generate images from natural language descriptions, called “prompts”. In case you are interested in generating and saving images with DALL-E using the OpenAI API, you can have a look at the related tutorial.

In this tutorial, we will try to achieve the opposite, meaning that our goal is to generate text from images. Our approach will be to get the labels from the image using the AWS Rekognition and then pass them to OpenAI to generate the image caption.

Image Caption Generator

We will build a function that takes an image as input and it returns the image caption. The image caption generator function follows the steps below:

  • It takes an image as input (the path of the image that is stored locally)
  • It creates a tmp folder in order to save a tmp.jpg image
  • It resizes the original and saves it as a tmp.jpg in order to make sure that the image is not too big for the AWS Rekognition, since there is a limitation
  • It keeps the image labels with a confidence greater than 0.7, and it stores them in a list
  • It passes the labels to OpenAI API using the proper prompt
  • It returns the caption of the image

Let’s dive into the coding part.

import openai
import os
openai.api_key = os.getenv('OpenAI')
import boto3
from PIL import Image

client=boto3.client('rekognition')

def image_caption_generator(image_path):
    
    # create a tmp folder in order to save the resized input image
    if not os.path.exists('tmp'):
        os.makedirs('tmp')
    
    # Open the original image
    img = Image.open(image_path)  

    # Set the desired size for the resized image
    new_size = (100, 100)  

    # Resize the image
    resized_img = img.resize(new_size)

    # Save the resized image
    resized_img.save('tmp/tmp.jpg') 
    
    with open('tmp.jpg', 'rb') as image:
        response = client.detect_labels(Image={'Bytes': image.read()})
    
    image_labels = []
    for label in response['Labels']:
        if label['Confidence']>70:
            image_labels.append(label['Name'].lower())

    # Generate a prompt by concatenating the image labels
    prompt = 'Generate an image caption for the following image labels: ' + ', '.join(image_labels)

    # Use the OpenAI API to generate image captions
    response = openai.Completion.create(
      model='text-davinci-003',
      prompt=prompt,
      temperature=0.5,
      max_tokens=50
    )

    # Extract the generated image captions from the API response
    generated_captions = response['choices'][0]['text']
    
    output = 'Generated Image Captions:\n'+generated_captions

    return output
 

Now it is time to test the image generator. We will try the cat-dog.jpg image.

print(image_caption_generator('cat-dog.jpg'))

And we get:

Generated Image Captions:


This golden retriever and cat are the best of friends, making them the perfect pet combination!
 

So, the output was:

Generated Image Captions:

This golden retriever and cat are the best of friends, making them the perfect pet combination!

Great job!

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s