Predictive Hacks

How to Connect External Data with GPT-3 using LlamaIndex

In this tutorial, we will show you how to connect external data with OpenAI GPT3 using LlamaIndex. For this example, we will connect the book Alice’s Adventures in Wonderland, by Lewis Carroll. By connecting the book with OpenAI GPT3 we will be able to make questions and receive answers related to the content of the book.

Installation and set-up

Using pip you can install the LlamaIndex library as follows:

pip install llama-index

Moreover, you will need to add the OpenAI API key as an environment variable called OPENAI_API_KEY, or alternatively, you can pass it by running:

# My OpenAI Key
import os
os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"
 

LlamaIndex Usage Pattern

The general usage pattern of LlamaIndex is as follows:

  1. Load in documents (either manually, or through a data loader)
  2. Construct Index (from Nodes or Documents)
  3. Query the index

The first task is to load the document. The book is in a .txt format called alice_in_wonderland.txt and it is under the data folder.

We can load the document by running:

from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()
 

Index Construction

We can construct an index over this document as follows:

from llama_index import GPTVectorStoreIndex

index = GPTVectorStoreIndex.from_documents(documents)
 

Save and Load the Index

The index is saved on memory. If we want to save them on disk, we can run:

from llama_index import StorageContext, load_index_from_storage

index.storage_context.persist(persist_dir='./storage')
# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir='./storage')

# load index
index = load_index_from_storage(storage_context)
 

Query the Index

High-Level API

Now we are ready to make questions and receive answers using the high-level API. For example, let’s ask for the plot of the book.

What is the plot of the book?

query_engine = index.as_query_engine()
response = query_engine.query("What is the plot of the book?")
print(response.response)
 

Output:

The book follows Alice, a young girl, as she falls down a rabbit hole and embarks on a series of strange and surreal adventures. Along the way, she meets a variety of strange creatures, including a talking caterpillar, a dodo, a Cheshire cat, and a mad hatter. She also participates in a variety of activities, such as a caucus race, a croquet game, and a tea party. Throughout her journey, Alice must use her wit and courage to solve puzzles and overcome obstacles in order to find her way home.

Let’s try another question:

Give me a summary of the book

response = query_engine.query("Give me a summary of the book")
print(response.response)
 

Output:

Alice is a young girl who falls down a rabbit hole and finds herself in a strange world. She meets a variety of characters, including a Dodo, a Mouse, a King, a Queen, and a Knave. The Dodo organizes a race, but when it is over, no one knows who has won. Alice gives out comfits as prizes and the Mouse asks her to tell her story. Alice then tells the story of how she ended up in the strange world. The King and Queen then ask Alice to explain a poem, but she is unable to do so. The Queen then orders Alice’s head to be cut off, but Alice wakes up to find herself back in her sister’s lap.

Finally, let’s try another question:

Give me a brief description of all characters of the book. I want the format to be Character –> Description

response = query_engine.query("""Give me a brief description of all characters of the book.\n
                                 I want the format to be Character --> Description""")
print(response.response)

Output:

Alice –> A young girl who falls down a rabbit hole and embarks on a fantastical journey through Wonderland.

White Rabbit –> A talking rabbit who Alice follows down the rabbit hole. He is always in a hurry and is late for important appointments.

Mock Turtle –> A talking turtle who Alice meets in Wonderland. He is sad and tells Alice stories about his past.

Duchess –> A character Alice meets in Wonderland who is rude and mean. She is accompanied by a Cheshire Cat.

Cheshire Cat –> A mysterious talking cat who appears and disappears at will. He is mischievous and often gives Alice cryptic advice.

Queen of Hearts –> The ruler of Wonderland who is always angry and orders people to be executed for the slightest offense.

Low-Level API

LlamaIndex gives us the option to work with a low-level API in case we would like to customize the query. For example:

from llama_index import (
    GPTVectorStoreIndex,
    ResponseSynthesizer,
)
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.indices.postprocessor import SimilarityPostprocessor

# build index
index = GPTVectorStoreIndex.from_documents(documents)

# configure retriever
retriever = VectorIndexRetriever(
    index=index, 
    similarity_top_k=2,
)

# configure response synthesizer
response_synthesizer = ResponseSynthesizer.from_args(
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ]
)

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

# query
response = query_engine.query("Give me a brief description of all characters of the book. I want the format to be Character --> Description")
print(response.response)
 

Output:

Alice –> A young girl who falls down a rabbit hole and embarks on a fantastical journey through Wonderland.

The White Rabbit –> A talking rabbit who Alice follows down the rabbit hole. He is always in a hurry and is late for important appointments.

The Duchess –> A rude and unpleasant woman who Alice meets in Wonderland. She is accompanied by a baby and a cook.

The Cheshire Cat –> A mysterious talking cat who appears and disappears at will. He is known for his mischievous grin and his wise advice.

The Mad Hatter –> A strange character who hosts a tea party for Alice. He is known for his nonsensical riddles and his odd behavior.

The March Hare –> A hare who attends the Mad Hatter’s tea party. He is known for his erratic behavior and his love of tea.

The Queen of Hearts –> The ruler of Wonderland. She is known for her temper and her love of executing people.

The King of Hearts –> The Queen’s husband and the ruler of Wonderland. He is known for his meekness and his inability to stand up to the Queen.

List Index

Keep in mind that we can also create a list of indexes. This is helpful when we request tasks such as summaries. Let’s ask again for a summary of the book for comparison purposes.

from llama_index import GPTListIndex


index = GPTListIndex.from_documents(documents)

query_engine = index.as_query_engine(
    response_mode="tree_summarize"
)
response = query_engine.query("Give me a summary of the book")
print(response.response)
 

Output:

Alice’s Adventures in Wonderland is a classic children’s novel by Lewis Carroll. The story follows Alice, a young girl who falls down a rabbit hole into a fantastical world populated by talking animals and other strange creatures. Through her adventures, Alice meets a variety of characters, including the White Rabbit, the Cheshire Cat, the Mad Hatter, the Queen of Hearts, and the Mock Turtle. She embarks on a series of adventures, during which she grows and shrinks in size, meets a variety of characters, and solves puzzles. Along the way, she learns valuable lessons about life and discovers her own identity. In the end, Alice wakes up from her dream and returns to reality, but she never forgets the wonderful experiences she had in Wonderland.

The Takeaway

With the LlamaIndex we are able to unlock the power of LLM by connecting our custom data. The data can be in different formats, such as txt, pdf, html and so on and from different sources like Notion, Slack, Twitter, LinkedIn and so on. Moreover, the data can be from customers’ reviews or forums.

In the next tutorial, we will show you how to connect customer data to build chatbots using the LangChain library. Stay tuned!

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s