The output of the LLMs is plain text. However, many times we want to get structured responses in order to be able to analyze them better. The LangChain library contains several output parser classes that can structure the responses of the LLMs. The two main methods of the output parsers classes are:
- “Get format instructions”: A method that returns a string with instructions about the format of the LLM output
- “Parse”: A method that parses the unstructured response from the LLM into a structured format
You can find an explanation of the output parses with examples in LangChain documentation. In this tutorial, we will show you something that is not covered in the documentation, and this is how to generate a list of different objects as structured outputs.
Example of Structured Outputs of Lists and Dictionaries
Let’s say that I would like to get the following information:
- The
year
of the Olympics - The
location
of the Olympics - The
top-3 countries
in terms of gold medals - The
gold medals
of the top-3 countries
We would like the output of the LLM to be a JSON where the keys will be the required outputs such a years
, location
and so on, and the values will be either lists (for year and location) or dictionaries (for the top 3 countries and their corresponding medals).
Let’s start coding by loading the required libraries:
from langchain.prompts import ( PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate, ) from langchain.llms import OpenAI from langchain.chat_models import ChatOpenAI from langchain.output_parsers import PydanticOutputParser from pydantic import BaseModel, Field, validator from typing import List, Dict, TypedDict chat_model = ChatOpenAI(temperature=0)
Using the PydanticOutputParser
we will create a class called OlympicMedals
. Pay attention to the way that we define the fields. Also, it is important to pass a description within each field.
class OlympicMedals(BaseModel): year: List[str] = Field(description="a list that shows the year that the Olympics took place") location: List[str] = Field(description="a list of cities where the Olympics took place") countries: List[TypedDict("countries", {"1st": str, "2nd": str, "3rd": str})] = Field(description="The top 3 countries in terms of gold medals in Olympics") medals: List[TypedDict("medals", {"1st": int, "2nd": int, "3rd": int})] = Field(description="The number of gold medals for the top 3 countries in Olympics")
Then, we have to set up the parser and inject the instructions into the prompt template:
parser = PydanticOutputParser(pydantic_object=OlympicMedals) format_instructions = parser.get_format_instructions()
We can see the fortmat_instructions by printing them:
print(format_instructions)
Output:
The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.
Here is the output schema:
```
{"properties": {"year": {"title": "Year", "description": "a list that shows the year that the Olympics took place", "type": "array", "items": {"type": "string"}}, "location": {"title": "Location", "description": "a list of cities where the Olympics took place", "type": "array", "items": {"type": "string"}}, "countries": {"title": "Countries", "description": "The top 3 countries in terms of gold medals in Olympics", "type": "array", "items": {"$ref": "#/definitions/countries"}}, "medals": {"title": "Medals", "description": "The number of gold medals for the top 3 countries in Olympics", "type": "array", "items": {"$ref": "#/definitions/medals"}}}, "required": ["year", "location", "countries", "medals"], "definitions": {"countries": {"title": "countries", "type": "object", "properties": {"1st": {"title": "1St", "type": "string"}, "2nd": {"title": "2Nd", "type": "string"}, "3rd": {"title": "3Rd", "type": "string"}}, "required": ["1st", "2nd", "3rd"]}, "medals": {"title": "medals", "type": "object", "properties": {"1st": {"title": "1St", "type": "integer"}, "2nd": {"title": "2Nd", "type": "integer"}, "3rd": {"title": "3Rd", "type": "integer"}}, "required": ["1st", "2nd", "3rd"]}}}
```
At this point, we can build the prompt using the ChatPromptTemplate:
prompt = ChatPromptTemplate( messages=[ HumanMessagePromptTemplate.from_template("answer the users question as best as possible.\n{format_instructions}\n{question}") ], input_variables=["question"], partial_variables={"format_instructions": format_instructions} )
Now, we can pass the question into the prompt template. The question is:
For the olympic games in 1980, 1984, 1988, 1992, 1996, 2000, 2004, 2008, 2012 and 2016, return the top 3 countries in terms of gold medals, the year, the number of gold medals and the location of the Olympics
_input = prompt.format_prompt(question="For the olympic games in 1980, 1984, 1988, 1992, 1996, 2000, 2004, 2008, 2012 and 2016, return the top 3 countries in terms of gold medals, the year, the number of gold medals and the location of the Olympics") output = chat_model(_input.to_messages())
Finally, we can parse the content of the output as follows:
my_output = parser.parse(output.content)
Let’s print the my_output
:
print(my_output)
Output:
year=['1980', '1984', '1988', '1992', '1996', '2000', '2004', '2008', '2012', '2016'] location=['Moscow', 'Los Angeles', 'Seoul', 'Barcelona', 'Atlanta', 'Sydney', 'Athens', 'Beijing', 'London', 'Rio de Janeiro'] countries=[{'1st': 'Soviet Union', '2nd': 'East Germany', '3rd': 'Bulgaria'}, {'1st': 'United States', '2nd': 'West Germany', '3rd': 'Romania'}, {'1st': 'Soviet Union', '2nd': 'East Germany', '3rd': 'United States'}, {'1st': 'Unified Team', '2nd': 'United States', '3rd': 'Germany'}, {'1st': 'United States', '2nd': 'Russia', '3rd': 'Germany'}, {'1st': 'United States', '2nd': 'Russia', '3rd': 'China'}, {'1st': 'United States', '2nd': 'Russia', '3rd': 'China'}, {'1st': 'China', '2nd': 'United States', '3rd': 'Great Britain'}, {'1st': 'United States', '2nd': 'China', '3rd': 'Russia'}, {'1st': 'United States', '2nd': 'Great Britain', '3rd': 'China'}] medals=[{'1st': 80, '2nd': 47, '3rd': 41}, {'1st': 83, '2nd': 61, '3rd': 30}, {'1st': 55, '2nd': 37, '3rd': 30}, {'1st': 45, '2nd': 37, '3rd': 26}, {'1st': 44, '2nd': 32, '3rd': 25}, {'1st': 37, '2nd': 32, '3rd': 27}, {'1st': 36, '2nd': 32, '3rd': 27}, {'1st': 51, '2nd': 36, '3rd': 29}, {'1st': 46, '2nd': 38, '3rd': 29}, {'1st': 46, '2nd': 27, '3rd': 26}]
Note that the type of my_output
is OlympicMedals
and we can easily extract the key values. For example:
print(my_output.countries)
Output:
[{'1st': 'Soviet Union', '2nd': 'East Germany', '3rd': 'Bulgaria'},
{'1st': 'United States', '2nd': 'West Germany', '3rd': 'Romania'},
{'1st': 'Soviet Union', '2nd': 'East Germany', '3rd': 'United States'},
{'1st': 'Unified Team', '2nd': 'United States', '3rd': 'Germany'},
{'1st': 'United States', '2nd': 'Russia', '3rd': 'Germany'},
{'1st': 'United States', '2nd': 'Russia', '3rd': 'China'},
{'1st': 'United States', '2nd': 'Russia', '3rd': 'China'},
{'1st': 'China', '2nd': 'United States', '3rd': 'Great Britain'},
{'1st': 'United States', '2nd': 'China', '3rd': 'Russia'},
{'1st': 'United States', '2nd': 'Great Britain', '3rd': 'China'}]
Or:
print(my_output.medals)
Output:
[{'1st': 80, '2nd': 47, '3rd': 41},
{'1st': 83, '2nd': 61, '3rd': 30},
{'1st': 55, '2nd': 37, '3rd': 30},
{'1st': 45, '2nd': 37, '3rd': 26},
{'1st': 44, '2nd': 32, '3rd': 25},
{'1st': 37, '2nd': 32, '3rd': 27},
{'1st': 36, '2nd': 32, '3rd': 27},
{'1st': 51, '2nd': 36, '3rd': 29},
{'1st': 46, '2nd': 38, '3rd': 29},
{'1st': 46, '2nd': 27, '3rd': 26}]
Or:
print(my_output.year)
Output:
['1980',
'1984',
'1988',
'1992',
'1996',
'2000',
'2004',
'2008',
'2012',
'2016']
Or:
print(my_output.location)
Output:
['Moscow',
'Los Angeles',
'Seoul',
'Barcelona',
'Atlanta',
'Sydney',
'Athens',
'Beijing',
'London',
'Rio de Janeiro']
Closing Remarks
Most of the time, we would like the output of the LLMs to be structured. LangChain enables us to work in this direction. Depending on the case, the required format can be challenging. The Pydantic libraries in collaboration with LangChain give us the ability to build more complicated outputs.