Full access is free during Beta. A paid subscription will be offered after Beta.

LangChain — User Guide

LangChain apps; RAG & agents.

Strengths
  • Provides a complete tool chain for RAG (Retrieval Augmentation Generation)
  • Support building complex AI agents and workflows
  • Compatible with all mainstream models such as OpenAI, Anthropic, Hugging Face, etc.
  • LangSmith provides complete debugging and monitoring capabilities
  • Active open source community, a large number of ready-made integrated components
Best for
  • Build an enterprise knowledge base question and answer system (RAG)
  • Develop AI Agents that use tools
  • Build a multi-step AI workflow (Chain)
  • Document processing and information extraction
  • Build conversational AI applications

RAG (Retrieval Augmentation Generation)

RAG is the most commonly used application scenario of LangChain, allowing AI to answer questions based on your private documents.

Scenario

Build a local document question and answer system

Prompt example
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA

# 1. Load document
loader = PyPDFLoader("company_manual.pdf")
documents = loader.load()

# 2. Split text
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200
)
chunks = splitter.split_documents(documents)

# 3. Create vector database
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)

# 4. Build a question and answer chain
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-4o"),
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True
)

# 5. Ask a question
result = qa_chain.invoke({"query": "What is the company's annual leave policy?"})
print(result["result"])
print("source:", [doc.metadata for doc in result["source_documents"]])
Output / what to expect

The system will:

  1. Retrieve the 3 most relevant text fragments from PDF
  2. Pass these fragments as context to GPT-4o
  3. GPT-4o answers questions based on context
  4. Return answers and source document information
Tips

The settings of chunk_size and chunk_overlap are important and are recommended to be adjusted according to the document type.

Scenario

Multi-document knowledge base system

Prompt example
from langchain_community.document_loaders import DirectoryLoader

# Load documents from the entire directory
loader = DirectoryLoader(
    "./docs",
    glob="**/*.pdf",
    loader_cls=PyPDFLoader
)
documents = loader.load()
print(f"{len(documents)} document fragments loaded")

# The subsequent steps are the same as for single document
# Vector database will automatically handle multiple documents
Output / what to expect
Load all PDFs in the directory at once, Build a unified knowledge base, Supports cross-document question answering and information retrieval.
Tips

For large document libraries, it is recommended to use a persistent vector database (such as Chroma Persistence or Pinecone).

Starter & above

The rest of this guide

Additional scenarios and the full comparison table are included with Starter and above. Sign in with an eligible account to load them.

You're on the Free plan. Upgrade to Starter or higher to unlock the rest of this guide—additional scenarios and the full comparison table.

Loading full guide…