VTeam AI

Beginner Tutorials

LangChains for NER, Summarization and Text Tagging

Diving deeper into the world of LangChains? Our recent tutorial unveiled the magic of crafting apps using the LangChain framework over Large Language Models (LLMs). From Grammar Checkers to Language Translators, we've made the journey beginner-friendly. Ready for more advanced NLP challenges? Stick around as we unravel them using LangChains & LLMs!

We have already covered a beginner-friendly tutorial on LangChains on how to create different apps using the LangChain framework over LLMs. If you missed it, we have got your covered here.

So, as you must know LangChain is a powerful framework built around Large Language Models (LLMs), designed for tasks such as chatbots, generative question-answering, summarization, and more. The core idea of LangChain is to "chain" together different components, creating sequences of components or sub-chains to achieve specific tasks. These components include prompt templates, language models, and output parsers, working harmoniously to handle user input, generate responses, and process outputs. LangChain simplifies the customization of models like GPT-3 by providing an API for prompt engineering, enhancing the approachability of working with LLMs for various applications. It streamlines integration with different types of models and interfaces, allowing LLMs to take strings as input and produce strings as output, enhancing the development of applications powered by large language models.

In the last tutorial, we built 3 different apps i.e. Grammar Checker, Tone changer, and Language Translator for which the codes are available in the blog post mentioned above. Making things a little complicated, we will jump onto some NLP-related problem statements and how to implement them using LangChains & LLMs. But before jumping onto the codes, let’s understand a few fundamental concepts required to understand the tutorial.

Divider

Named Entity Recognition (NER): NER involves identifying and classifying named entities (such as names of people, places, organizations, dates, etc.) in text. Example: "Johnny lives in Florida" -> NER identifies "Johnny" as a PERSON and "Florida" as a GPE (Geopolitical Entity).

Text tagging: Text tagging in Natural Language Processing (NLP) involves assigning specific labels or tags to words in a text corpus to indicate their grammatical or semantic properties. These tags provide information about the part of speech (noun, verb, adjective, etc.) or other linguistic characteristics of each word.

LLMs: Large Language Models (LLMs) are a type of artificial intelligence that is characterized by their large size. They have the ability to process and generate human-like responses to natural language queries. LLMs are trained on vast amounts of text data, often scraped from the Internet, using AI accelerators. They are used in various natural language processing tasks and have the capability to generate coherent and contextually relevant text. Popular examples of LLMs include BERT, GPT-3, and T5. These models have revolutionized natural language processing by demonstrating the ability to generate human-like text and comprehend context, leading to applications in chatbots, content generation, translation, and more.

LangChains: LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). It allows chaining together different components to create advanced use cases around LLMs, such as chatbots, generative question-answering, summarization, and more. LangChain offers a standard interface for building chains of models and integrates with various tools. It facilitates interactions between chains and external data sources for data-augmented generation.

Divider

Tutorial 1: Named Entity Recognition

In this short demo, we will extract the name of a person, organization, and city. Let’s get started with the required imports.

1

Here, Schema refers to the entities we wish to extract in the form of a dictionary.

2

Next, we are loading the LLM object. Remember we need an api_key at this point. Use the schema and LLM as a parameter to pass to create_extraction_chain().

Time for some testing.

3

4

As you can observe, we have successfully retrieved all the entities present in the text. If you want other entities as well, mention them in the schema.

Tutorial 2: Article Summarization

This one is really interesting. We will be providing LangChain any URL, be it github or some general article. And then summarize it using different chain types present in LangChains.

Importing libraries.

5

Next, load an LLM. This requires you to have an api_key.

6

Next, we will be trying a summarization of the repo using different chain types present in LangChains: https://github.com/facebookresearch/llama that we have loaded in memory in the above step.

 ChainType=’stuff’.

7

ChainType=’map_reduce’.

8

ChainType=’refine’.

9

As you can see, how changing the chain type affects the outputs. But are these different chain types? Chain types basically help to design the prompt in the backend to be fed to the LLM.

Stuff Chain: This chain type involves processing documents one by one without any aggregation or combination. It performs an LLM chain on each document individually.

Map_Reduce Chain: The map_reduce chain first applies an LLM chain to each document individually (Map step), treating the chain's output as a new document. Then, it passes these new documents through a separate chain to combine them and produce a single output (Reduce step).

Refine Chain: The refine chain type refines the output of a previous chain. It's commonly used to enhance the output of a map_reduce or stuff chain, applying additional transformations or filters to improve the quality of the result.

Tutorial 3: Text tagging and classification

Most useful of the lot, text tagging helps you to extract any entity/feature from input text that can be helpful in classification as well.

Importing libraries.

11

Designing the schema.

12

Here, we are extracting 5 things:

  1. Aggressiveness of the tone.
  2. Language.
  3. Mood.
  4. Grammar.
  5. Trait.

Some points to note:

  • Enum is used to keep the output for a category in a finite list.
  • Description can be assigned to categories for LLm to understand better.
  • The data type can be anything, be it float, boolean, or string.
  • The schema is a combination of MultiClass and Binary Classification.

Loading LLM and creating a chain. You need an api_key for this!

13

Examples.

15

With this, we will be wrapping up our 2nd edition on LangChains.

In conclusion, exploring the world of Langchains has been an enlightening journey. Through the tutorials, we've uncovered the power of Langchains, a versatile framework built around Large Language Models (LLMs). The modular and flexible nature of Langchains empowers us to create advanced language model applications with ease. By understanding the core concepts, such as components, chains, prompt templates, and agents, we've unlocked the potential to build innovative solutions for chatbots, question-answering, summarization, and more. As we delve deeper into Langchains, we're equipped with the tools to harness the capabilities of LLMs and pave the way for groundbreaking applications in natural language processing. The journey has just begun, and with Langchains, the possibilities are limitless.

We will be bringing in more tutorials soon. Stay tuned!!

Disclaimer: The views and opinions expressed in this blog post are solely those of the authors and do not reflect the official policy or position of any of the mentioned tools. This blog post is not a form of advertising and no remuneration was received for the creation and publication of this post. The intention is to share our findings and experiences using these tools and is intended purely for informational purposes.