
Strengths
- Provides a complete tool chain for RAG (Retrieval Augmentation Generation)
- Support building complex AI agents and workflows
- Compatible with all mainstream models such as OpenAI, Anthropic, Hugging Face, etc.
- LangSmith provides complete debugging and monitoring capabilities
- Active open source community, a large number of ready-made integrated components
Best for
- Build an enterprise knowledge base question and answer system (RAG)
- Develop AI Agents that use tools
- Build a multi-step AI workflow (Chain)
- Document processing and information extraction
- Build conversational AI applications
RAG (Retrieval Augmentation Generation)
RAG is the most commonly used application scenario of LangChain, allowing AI to answer questions based on your private documents.
Build a local document question and answer system
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
# 1. Load document
loader = PyPDFLoader("company_manual.pdf")
documents = loader.load()
# 2. Split text
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, chunk_overlap=200
)
chunks = splitter.split_documents(documents)
# 3. Create vector database
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)
# 4. Build a question and answer chain
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4o"),
retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
return_source_documents=True
)
# 5. Ask a question
result = qa_chain.invoke({"query": "What is the company's annual leave policy?"})
print(result["result"])
print("source:", [doc.metadata for doc in result["source_documents"]])The system will:
- Retrieve the 3 most relevant text fragments from PDF
- Pass these fragments as context to GPT-4o
- GPT-4o answers questions based on context
- Return answers and source document information
The settings of chunk_size and chunk_overlap are important and are recommended to be adjusted according to the document type.
Multi-document knowledge base system
from langchain_community.document_loaders import DirectoryLoader
# Load documents from the entire directory
loader = DirectoryLoader(
"./docs",
glob="**/*.pdf",
loader_cls=PyPDFLoader
)
documents = loader.load()
print(f"{len(documents)} document fragments loaded")
# The subsequent steps are the same as for single document
# Vector database will automatically handle multiple documentsFor large document libraries, it is recommended to use a persistent vector database (such as Chroma Persistence or Pinecone).