Predictive Hacks

NER with OpenAI and LangChain

Named Entity Recognition (NER) is a natural language processing (NLP) technique used to identify and classify named entities within a text into predefined categories such as the names of persons, organizations, locations, dates, quantities, monetary values, percentages, and more. The primary goal of NER is to extract and categorize specific entities mentioned in unstructured text data to better understand the underlying information and relationships within the text.

NER involves several steps:

  1. Tokenization: Breaking down the text into individual words or tokens.
  2. Part-of-Speech Tagging: Assigning grammatical parts of speech (e.g., noun, verb, adjective) to each token.
  3. Named Entity Classification: Identifying tokens that represent named entities and assigning them to predefined categories like person names, organization names, locations, etc.
  4. Entity Extraction: Extracting the identified named entities along with their respective categories from the text.

NER systems can be rule-based, statistical, or machine learning-based. Machine learning-based approaches, particularly those using deep learning models like recurrent neural networks (RNNs) or transformers, have shown significant advancements in NER accuracy due to their ability to learn complex patterns and representations from large amounts of data.

NER finds applications in various fields such as information retrieval, question answering systems, sentiment analysis, and more, where understanding specific entities within text data is crucial for analysis and decision-making.

In this blog, we have provided examples of Rule-Based Matching for NLP using SpaCy, NER with AWS Comprehend and NER with SpaCy. In this post, we will show you how to apply a Name Entity Recognition using the OpenAI and LangChain.

NER with LangChain

In our case, not only do we want to recognize the entities, but we also want to return them in a structured format. This is the most challenging part, but LangChain makes our life easier.

Let’s dive into coding by importing the required libraries and defining the model.

from typing import List
from pydantic import BaseModel, Field

from langchain.utils.openai_functions import convert_pydantic_to_openai_function

from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser

from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

model = ChatOpenAI(temperature=0)
 

Then, we will create the class NER where we will define the required structure of the output using the pydantic library.

class NER(BaseModel):
    """We want to extract the 'name', 'age', 'date', 'address', 'phone' and 'bank account' entities"""
    ner: List[str] = Field(description="the detected entity in the document such as name, age, date, address, phone, company name and bank account")
    type: List[str] = Field(description="the type of the detected entity with possible values: 'name', 'age', 'date', 'address', 'phone', 'company name' and 'bank account'. For every entity detected in ner this should be the corresponding type")
 

The next step is to pass the function into the model as follows:

extraction_functions = [convert_pydantic_to_openai_function(NER)]
extraction_model = model.bind(functions=extraction_functions, function_call={"name": "NER"})
 

Now, we are ready to create the prompt:

prompt = ChatPromptTemplate.from_messages([
    ("system", "Extract the relevant information, if not explicitly provided do not guess. Extract partial info"),
    ("human", "{input}")
])
 

Finally, we will create a chain with the prompt, the model and the structured output.

extraction_chain = prompt | extraction_model | JsonOutputFunctionsParser()
 

We are done. Let’s test it on the following text.

My name is Joe Smith and I am married to Anna Parker and we have one child called Gilda. I was born in 1988 and I am 35 years old.
I work at Predictive Hacks and my email is [email protected].
I live in Athens, Greece. My phone number is 623 12 34 567 and my bank account is 123-123-567-888

mytxt = """My name is Joe Smith and I am married to Anna Parker and we have one child called Gilda. I was born in 1988 and I am 35 years old.
I work at Predictive Hacks and my email is [email protected]. 
I live in Athens, Greece. My phone number is 623 12 34 567 and my bank account is 123-123-567-888"""
 

extraction_chain.invoke({"input": mytxt})
 

Output:

{'ner': ['Joe Smith',
  'Anna Parket',
  'Gilda',
  '1988',
  '35',
  'Predictive Hacks',
  '[email protected]',
  'Athens, Greece',
  '623 12 34 567',
  '123-123-567-888'],
 'type': ['name',
  'name',
  'name',
  'date',
  'age',
  'company name',
  'email',
  'address',
  'phone',
  'bank account']}

As you can see, we were able to detect all the entities. The format is a JSON with two keys, the ner which are the entities and the type which are the type of the entities. The values of the keys are lists.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

1 thought on “NER with OpenAI and LangChain”

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s