# LangGraph RAG Agent Tutorial | Basics to Advanced Multi-Agent AI Chatbot

Retrieval-Augmented Generation (RAG) is becoming the go-to pattern for building AI systems that can fetch real-time or domain-specific knowledge on demand. But RAG alone doesn’t make your chatbot smart.

With LangGraph, you can build stateful, agent-like flows that combine tools, memory, structured decision logic, and retrieval—all driven by LLMs.

In this blog, we’ll build up to a full LangGraph-based RAG Agent from scratch. We'll follow a practical path:

1. Start with basic LLM usage
    
2. Bind tools to the LLM
    
3. Use LangGraph to build stateful agents
    
4. Add memory, routing logic, and tool execution
    
5. Finally, combine all of it with document retrieval to create a RAG-powered agent
    

Each section mirrors what you’d build in a notebook, but with clear explanations to help you understand *why* each piece matters.

Let’s start with the first building block: invoking an LLM.

---

### 🧠 Step 1: Invoking a Language Model (LLM)

To begin, we’ll use `ChatOpenAI` from LangChain to invoke a language model. We’ll keep it simple:

```python
from langchain_openai import ChatOpenAI

# Initialize the LLM
llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0.7)

# Basic prompt
response = llm.invoke("What is artificial intelligence?")
print(response.content)
```

This returns a standard response from the LLM. But the real value comes when you treat the LLM like a conversation partner using message objects:

```python
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="You are a helpful AI assistant that explains complex topics simply."),
    HumanMessage(content="Explain machine learning in 2 sentences.")
]

response = llm.invoke(messages)
print(response.content)
```

Using `SystemMessage` and `HumanMessage` gives you more control over behavior and tone. It’s also how you’ll structure inputs later when building memory-enabled and multi-step agents.

Now that we can invoke an LLM in both simple and structured ways, we’re ready to start integrating tools.

---

### 🔧 Step 2: Extending LLMs with Tools

LLMs are powerful, but they can’t do math or fetch real-time information on their own. To make your LLM truly useful, you can bind it with external tools. Here’s how:

```python
from langchain_core.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun

@tool
def calculator(expression: str) -> str:
    """Calculate mathematical expressions. Use this for any math calculations."""
    try:
        result = eval(expression)
        return f"The result of {expression} is {result}"
    except Exception as e:
        return f"Error calculating {expression}: {str(e)}"

search_tool = DuckDuckGoSearchRun()
```

We now have two tools:

* `calculator` to perform basic arithmetic
    
* `search_tool` to fetch info from the web
    

To bind these tools to the LLM:

```python
# Bind tools to the LLM
tools = [calculator, search_tool]
llm_with_tools = llm.bind_tools(tools)
```

Let’s test the LLM with tools:

```python
response = llm_with_tools.invoke("What's 25 * 4 + 17?")
print(response.content)
```

However, when an LLM is tool-enabled, its response might include `tool_calls` instead of just plain text. To handle that:

```python
def handle_tool_calls(response, tool_map):
    if not getattr(response, 'tool_calls', None):
        return

    for tool_call in response.tool_calls:
        tool_name = tool_call['name']
        args = tool_call['args']
        tool = tool_map.get(tool_name)
        if tool:
            result = tool.invoke(args)
            print(f"Tool result: {result}")
```

Then:

```python
tool_map = {
    'calculator': calculator,
    'duckduckgo_search': search_tool,
}

def test_llm_tool(query):
    response = llm_with_tools.invoke(query)
    print(response.content)
    handle_tool_calls(response, tool_map)

# Run some queries
test_llm_tool("What's 25 * 4 + 17?")
test_llm_tool("Search for recent news about artificial intelligence")
```

With this setup, your LLM is now a *tool-using agent*.

Next, we’ll take this a step further by wiring everything into a LangGraph to make it stateful and multi-turn.

---

### 🧩 Step 3: Building a Basic LangGraph Chatbot

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1750513436662/a0dde8f7-3b8d-40df-a6c4-526914621c70.png align="center")

At its core, LangGraph lets you define a graph of nodes that process conversational state. Let’s begin with a minimal chatbot graph.

#### Define Chatbot State

```python
from typing import Annotated, TypedDict
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langgraph.graph.message import add_messages

class State(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]
```

Here, we define a `State` object that will carry the conversation. The `add_messages` function ensures new messages are appended correctly.

#### Create the Chatbot Node

```python
def chatbot_node(state: State) -> State:
    response = llm.invoke(state["messages"])
    return {"messages": [response]}
```

This node accepts messages and returns the updated state with the AI's response.

#### Build and Compile the Graph

```python
from langgraph.graph import StateGraph, START, END

graph_builder = StateGraph(State)
graph_builder.add_node("chatbot", chatbot_node)
graph_builder.add_edge(START, "chatbot")
graph_builder.add_edge("chatbot", END)
graph = graph_builder.compile()
```

This sets up a simple one-node chatbot pipeline. You can now test it:

```python
def test_chatbot(message: str):
    initial_state = {"messages": [HumanMessage(content=message)]}
    result = graph.invoke(initial_state)
    print("🤖 Assistant:", result["messages"][-1].content)

test_chatbot("Hello! My name is Pradip")
test_chatbot("Do you remember my name?")
```

You’ll notice it doesn’t remember past messages yet. That’s what we’ll fix in the next step—by adding memory.

---

### 🧠 Step 4: Adding Memory to the Chatbot

To make the chatbot remember previous conversations, we need to add a memory backend.  
LangGraph provides `MemorySaver` for this purpose.

```python
from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()

# Compile the graph again with memory enabled
graph_with_memory = graph_builder.compile(checkpointer=memory)
```

We can now run the chatbot in a threaded manner, and it will retain context:

```python
def chat_with_memory(message: str, thread_id: str):
    config = {"configurable": {"thread_id": thread_id}}
    initial_state = {"messages": [HumanMessage(content=message)]}
    result = graph_with_memory.invoke(initial_state, config)
    print("🤖 Assistant:", result["messages"][-1].content)

# Start a conversation
chat_with_memory("Hi, my name is Pradip", thread_id="thread-1")
chat_with_memory("What's my name?", thread_id="thread-1")
```

With memory in place, the assistant can now recall previous messages.  
This forms the foundation for building multi-turn, context-aware agents.

Next, we’ll add more intelligence to the flow using routing and tools.

## 🛠️ Step 5 – LangGraph Agent with Tools

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1750513455175/b7ea4e76-8e68-4341-b9e7-6d2925859b8f.png align="center")

So far, our chatbot can talk (Step 3) and remember context (Step 4). Now we want it to **recognise when a tool is needed and call it automatically**.

At a high‑level we’ll add two new pieces:

1. `chatbot` node – decides whether it can answer directly or should call a tool.
    
2. `tools` node – actually runs the requested tool‑call and passes the result back.
    

The conversation state stays the same – a list of LangChain `Message` objects – so we just rename it to emphasise the agent role:

```python
from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
    """State for our two‑node agent"""
    messages: Annotated[list[BaseMessage], add_messages]
```

---

### 1\. Bind the LLM to our existing tools

```python
llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0.7)
llm_with_tools = llm.bind_tools(tools)  # `tools` already contains `calculator` and `search_tool`
```

Binding keeps the API exactly the same – we just swap `llm` for `llm_with_tools` when we need tool‑usage.

---

### 2\. The **chatbot** node – decide *answer vs. tool*

```python
from langchain_core.messages import HumanMessage, AIMessage

def chatbot_node(state: AgentState) -> AgentState:
    """Gatekeeper: answer directly or request a tool"""
    system_message = (
        "You are a helpful assistant.\n"
        "Use the `web_search` tool for real‑time facts and `calculator` for maths.\n"
        "Otherwise answer directly."
    )

    messages = [
        {"role": "system", "content": system_message},
        *state["messages"],
    ]

    response = llm_with_tools.invoke(messages)
    return {"messages": [response]}  # LangGraph merges this into the running state
```

*Key idea*: we embed the routing logic inside the prompt – the LLM decides whether tool calls are needed and, if so, emits a `tool_calls` entry in its JSON response.

---

### 3\. The **tools** node – run any requested tool‑calls

Instead of re‑implementing the execution loop, we reuse the pre‑built `ToolNode`:

```python
from langgraph.prebuilt import ToolNode

tool_node = ToolNode(tools)  # automatically dispatches and streams results back
```

---

### 4\. Routing logic

We just need a small helper that checks whether the last message contains tool calls:

```python
from typing import Literal

def should_continue(state: AgentState) -> Literal["tools", "end"]:
    last = state["messages"][-1]
    return "tools" if getattr(last, "tool_calls", None) else "end"
```

---

### 5\. Wire it all together with `StateGraph`

```python
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver

workflow = StateGraph(AgentState)
workflow.add_node("chatbot", chatbot_node)
workflow.add_node("tools",   tool_node)

workflow.add_edge(START, "chatbot")
workflow.add_conditional_edges("chatbot", should_continue, {"tools": "tools", "end": END})
workflow.add_edge("tools", "chatbot")  # come back after tools run

app = workflow.compile(checkpointer=MemorySaver())
```

> **Why a loop back to** `chatbot`? After a tool runs we want the LLM to integrate the tool output and craft the final answer – so the graph cycles once.

---

### 6\. Quick manual test

```python
def chat_with_agent(msg: str, thread_id="demo"):
    cfg = {"configurable": {"thread_id": thread_id}}
    state = {"messages": [HumanMessage(content=msg)]}
    result = app.invoke(state, cfg)
    print(result["messages"][-1].content)

chat_with_agent("What's 15% of 240?")
chat_with_agent("Search for recent news about artificial intelligence")
```

You should see the *calculator* and *web\_search* tools being triggered automatically, followed by a neat, fully‑formed answer.

---

That’s a self‑routing, tool‑aware agent. In the next step we’ll **plug a knowledge‑base retriever into the tool‑chain** and teach the agent when to switch from web search to internal RAG – bringing us one step closer to a production‑ready assistant.

## 🔍 Step 6 – LangGraph RAG Agent

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1750513305336/aa14ccd5-f220-4d5f-b7e8-2007004b5979.png align="center")

> **Goal:** Give your agent up‑to‑date, domain‑specific knowledge so it can answer beyond the LLM’s training data.
> 
> We’ll layer **retrieval**, **routing**, and an optional **web‑search fallback** on top of the tool‑enabled agent from Step 5.

### 1️⃣ Index your documents once

```python
# ── Build & persist a Chroma index ────────────────────────────────
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader, Docx2txtLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

SOURCE_DIR   = Path("docs")          # put your files here
INDEX_DIR    = Path("chroma_db_1")   # will be created if missing
EMBED_MODEL  = "text-embedding-3-small"

# Load docs (keep only pdf/docx for brevity)
docs = []
for f in SOURCE_DIR.glob("*.*"):
    if f.suffix == ".pdf":
        docs += PyPDFLoader(str(f)).load()
    elif f.suffix == ".docx":
        docs += Docx2txtLoader(str(f)).load()

# Split & embed
chunks     = RecursiveCharacterTextSplitter(chunk_size=1_000, chunk_overlap=200).split_documents(docs)
embeddings = OpenAIEmbeddings(model=EMBED_MODEL)

vectordb = Chroma.from_documents(
    documents         = chunks,
    embedding         = embeddings,
    persist_directory = str(INDEX_DIR),
    collection_name   = "kb_collection",
)
vectordb.persist()
print("✅ Index built →", INDEX_DIR.resolve())
```

*Run this once; the agent will query the saved index at runtime.*

### 2️⃣ Expose a Retriever as a LangChain Tool

```python
retriever = vectordb.as_retriever(search_kwargs={"k": 2})

@tool
def rag_search_tool(query: str) -> str:
    """Search the knowledge‑base for relevant chunks"""
    results = retriever.invoke(query)
    return "

".join(d.page_content for d in results)
```

### 3️⃣ Optional fallback → real‑time web search

```python
from langchain_tavily import TavilySearch

tavily = TavilySearch(max_results=3, topic="general")

@tool
def web_search_tool(query: str) -> str:
    """Up‑to‑date web info via Tavily"""
    return "

".join(r["content"] for r in tavily.invoke({"query": query})["results"])  # simplified
```

### 4️⃣ Extend the Agent State

```python
class AgentState(State):          # add to previous `State`
    route:    str          # "rag", "answer", "web", "end"
    rag:      str | None   # KB result
    web:      str | None   # web‑search snippets
```

### 5️⃣ Decision / Execution Nodes

| Node | What it does |
| --- | --- |
| **router\_node** | Uses an LLM with structured output to decide the `route` – *rag*, *answer*, or *end*. |
| **rag\_node** | Runs `rag_search_tool`, then asks a *judge* LLM if the chunks are **sufficient**. Sets `route` to *answer* or *web*. |
| **web\_node** | Calls `web_search_tool` and passes snippets along. |
| **answer\_node** | Crafts the final reply, combining any `rag` and/or `web` context. |

Key implementation points (condensed):

```python
# ── Structured helpers ─────────────────
class RouteDecision(BaseModel):
    route: Literal["rag", "answer", "end"]
    reply: str | None = None

class RagJudge(BaseModel):
    sufficient: bool

router_llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0).with_structured_output(RouteDecision)
judge_llm  = ChatOpenAI(model="gpt-4.1-mini", temperature=0).with_structured_output(RagJudge)
answer_llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0.7)

# ── Router ─────────────────────────────
def router_node(state: AgentState) -> AgentState:
    q = state["messages"][-1].content
    decision = router_llm.invoke([
        ("system", "Decide route: rag / answer / end"),
        ("user", q)
    ])
    new_state = {**state, "route": decision.route}
    if decision.route == "end":
        new_state["messages"] += [AIMessage(content=decision.reply or "Hello!")]
    return new_state

# ── RAG lookup ─────────────────────────
def rag_node(state: AgentState) -> AgentState:
    q = state["messages"][-1].content
    chunks = rag_search_tool.invoke(q)
    verdict = judge_llm.invoke([("user", f"Question: {q}
Docs: {chunks[:300]}…")])
    return {**state, "rag": chunks, "route": "answer" if verdict.sufficient else "web"}

# ── Web search & Answer nodes omitted for brevity (same as notebook) ──
```

### 6️⃣ Wire up the Graph

```python
agent_graph = StateGraph(AgentState)
agent_graph.add_node("router",      router_node)
agent_graph.add_node("rag_lookup",  rag_node)
agent_graph.add_node("web_search",  web_node)
agent_graph.add_node("answer",      answer_node)

agent_graph.set_entry_point("router")
agent_graph.add_conditional_edges("router", from_router,
        {"rag": "rag_lookup", "answer": "answer", "end": END})
agent_graph.add_conditional_edges("rag_lookup", after_rag,
        {"answer": "answer", "web": "web_search"})
agent_graph.add_edge("web_search", "answer")
agent_graph.add_edge("answer", END)

agent = agent_graph.compile(checkpointer=MemorySaver())
```

### 7️⃣ Quick CLI test

```python
if __name__ == "__main__":
    config = {"configurable": {"thread_id": "thread‑12"}}
    while True:
        q = input("You: ").strip()
        if q in {"quit", "exit"}: break
        result = agent.invoke({"messages": [HumanMessage(content=q)]}, config)
        print(result["messages"][-1].content)
```

Now your LangGraph agent:

* **Routes** intelligently
    
* **Retrieves** domain knowledge with RAG
    
* **Falls back** to web search when KB is insufficient
    
* **Streams** multi‑turn answers with memory
    

In short, this is a *production‑ready skeleton* you can plug into any project.

---

## 🚀 Conclusion & Resources

In this tutorial we climbed the ladder from **basic LLM calls** ➜ **tool‑aware agents** ➜ **memory** ➜ **RAG** ➜ **full multi‑step routing** with LangGraph. You now have a production‑ready skeleton that can:

* Chat naturally across turns (memory)
    
* Decide when to use internal knowledge vs. external tools (router)
    
* Pull trusted data from your own docs (RAG)
    
* Fall back to real‑time web search when the KB is lacking
    

---

### 📂 Grab the code

* **Full Notebook on GitHub:** [LangGraph RAG Agent Notebook](https://github.com/PradipNichite/Youtube-Tutorials/blob/main/RAG_AI_Agent_using_LangGraph.ipynb)
    

🕹 Try the live RAG Agent: [https://agent.futuresmart.ai/](https://agent.futuresmart.ai/)

### 🎥 Watch the build walkthrough

%[https://youtu.be/60XDTWhklLA] 

---

### What’s next?

1. **Swap in your own docs.** Point the loader at your knowledge base and rebuild the index.
    
2. **Add streaming.** LangGraph supports async generators so you can pipe partial answers to the UI.
    
3. **Deploy.** Package the graph inside a FastAPI endpoint or a serverless function and wire up a front‑end.
    

Got questions or improvement ideas? drop a comment under the YouTube video – I’d love to hear how you extend this skeleton!

Happy building 🛠️🤖