Lesson 8 — AI Agents and Tool Use | Class 11

Story

Deepika's Research Assistant

Deepika, 16, from Lucknow spent three hours every day doing research for her school debates — searching Google, reading papers, taking notes, summarising. "Can I automate this?" she wondered after Lesson 7.

A regular ChatGPT prompt wouldn't work — it couldn't search the web or read PDFs. She needed an agent: a system that could decide what tools to use, use them in sequence, and synthesise the results into an answer.

Using LangGraph and OpenAI's tool-calling API, she built a research agent in 150 lines of Python. It searches arXiv, reads abstracts, cross-references sources, and writes a structured summary. What used to take 3 hours now takes 3 minutes. She now uses those 3 hours to actually understand the papers.

Section 1

What is an AI Agent?

A chatbot answers one question per turn. An agent can take multiple actions autonomously to complete a goal. It has three components:

Perceive: Read the current state — user request, previous actions, tool outputs
Plan: Decide what to do next — which tool to use, or whether to give a final answer
Act: Execute the tool call or produce the final response

The ReAct (Reason + Act) pattern alternates between thinking (generating a reasoning trace) and acting (calling tools), enabling the agent to use tool outputs to inform its next reasoning step.

Thought: "The user wants recent papers on RAG. I should search arXiv for retrieval-augmented generation papers from 2024."

Action: search_arxiv(query="retrieval-augmented generation 2024", max_results=5)

Observation: [5 paper titles and abstracts returned]

Thought: "I found relevant papers. The most cited one is about corrective RAG. I should read its full abstract and extract key findings."

Action: read_abstract(arxiv_id="2401.15884")

Final Answer: "Here are the 3 most important RAG papers from 2024 with summaries: ..."

Section 2

Tool Calling with OpenAI Function Calling

import openai, json

client = openai.OpenAI()

# ── Define tools as JSON schemas ──────────────────────────────────
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web and return top 3 relevant results",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Evaluate a mathematical expression safely",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "Python math expression e.g. '2**10 + 5*3'"
                    }
                },
                "required": ["expression"]
            }
        }
    }
]

# ── Implement the actual tools ─────────────────────────────────────
def search_web(query: str) -> str:
    """Stub — replace with SerpAPI, Tavily, or DuckDuckGo API."""
    return f"Search results for '{query}': [3 relevant web snippets]"

def calculate(expression: str) -> str:
    """Safe math evaluation — no exec/eval of arbitrary code."""
    import ast, math
    try:
        # Only allow safe nodes
        tree = ast.parse(expression, mode='eval')
        # Whitelist: numbers, operators, basic math functions
        allowed_nodes = {
            ast.Expression, ast.BinOp, ast.UnaryOp, ast.Num,
            ast.Add, ast.Sub, ast.Mult, ast.Div, ast.Pow,
            ast.USub, ast.UAdd, ast.Constant
        }
        for node in ast.walk(tree):
            if type(node) not in allowed_nodes:
                return "Error: unsafe expression"
        result = eval(compile(tree, "", "eval"))
        return str(result)
    except Exception as e:
        return f"Error: {e}"

TOOL_MAP = {"search_web": search_web, "calculate": calculate}

# ── Agent loop ─────────────────────────────────────────────────────
def run_agent(user_message: str) -> str:
    messages = [
        {"role": "system", "content": "You are a helpful research assistant. Use tools to find accurate information."},
        {"role": "user", "content": user_message}
    ]

    for _ in range(10):   # max 10 reasoning steps
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )

        msg = response.choices[0].message

        # If no tool call: agent has a final answer
        if not msg.tool_calls:
            return msg.content

        # Execute each tool call
        messages.append(msg)
        for tc in msg.tool_calls:
            fn_name = tc.function.name
            fn_args = json.loads(tc.function.arguments)
            result  = TOOL_MAP[fn_name](**fn_args)

            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": result
            })

    return "Max steps reached"

print(run_agent("What is 2^10 and who wrote the Attention Is All You Need paper?"))

Section 3

Agent Memory Types

In-Context Memory

The message history in the current conversation. Limited by context window (e.g., 128k tokens for GPT-4o). Lost when conversation ends.

External / Vector Memory

Past facts stored in a vector database (ChromaDB, Pinecone). Retrieved by similarity search. Survives across sessions.

Episodic Memory

Structured log of past agent actions and outcomes. The agent can learn from past mistakes by reviewing its history.

# External memory with ChromaDB (persistent across sessions)
# pip install chromadb

import chromadb

chroma = chromadb.PersistentClient(path="./agent_memory")
collection = chroma.get_or_create_collection("research_notes")

def remember(key: str, text: str):
    """Store a research note in vector memory."""
    collection.upsert(
        documents=[text],
        ids=[key]
    )

def recall(query: str, n=3) -> list[str]:
    """Retrieve most relevant past notes."""
    results = collection.query(query_texts=[query], n_results=n)
    return results['documents'][0]

# Usage:
remember("rag_2024", "RAG 2024: Corrective RAG paper shows 15% improvement over naive RAG...")
similar_notes = recall("what do I know about RAG?")
print(similar_notes)

Security warning: Never pass user input directly to eval() or exec() in tool implementations. The calculate() tool above uses AST parsing to whitelist safe node types — this is the correct pattern. An agent that can execute arbitrary Python code would be a critical security vulnerability.

Section 4

LangGraph: State Machine for Agents

# pip install langgraph langchain-openai

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
import operator

# ── Define agent state ─────────────────────────────────────────────
class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    step_count: int

# ── Nodes ──────────────────────────────────────────────────────────
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
def search_arxiv(query: str) -> str:
    """Search arXiv for papers on a topic."""
    return f"Papers found for '{query}': [arxiv results]"

@tool
def summarise_abstract(arxiv_id: str) -> str:
    """Get and summarise the abstract of an arXiv paper."""
    return f"Abstract summary for {arxiv_id}: [key findings]"

llm = ChatOpenAI(model="gpt-4o-mini").bind_tools([search_arxiv, summarise_abstract])

def agent_node(state: AgentState) -> AgentState:
    response = llm.invoke(state["messages"])
    return {"messages": [response], "step_count": state["step_count"] + 1}

def should_continue(state: AgentState) -> str:
    last = state["messages"][-1]
    if last.tool_calls:
        return "tools"
    return END

# ── Build graph ───────────────────────────────────────────────────
tool_node = ToolNode([search_arxiv, summarise_abstract])

graph = StateGraph(AgentState)
graph.add_node("agent", agent_node)
graph.add_node("tools", tool_node)
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue)
graph.add_edge("tools", "agent")   # after tools, return to agent

app = graph.compile()

# ── Run ───────────────────────────────────────────────────────────
result = app.invoke({
    "messages": [{"role": "user", "content": "Find 3 key papers on RAG from 2024 and summarise each"}],
    "step_count": 0
})
print(result["messages"][-1].content)

LangGraph vs simple loop: The simple loop in Section 2 works but has no error handling, no branching, and no way to inspect intermediate state. LangGraph models the agent as an explicit state machine — you can add human-in-the-loop nodes, parallel tool execution, conditional routing, and checkpointing for long-running research tasks.

🤖 Lesson 8 Quiz — AI Agents and Tool Use

1. The ReAct pattern alternates between Thought and Action steps. The key benefit over a single-pass LLM call is:

a) ReAct uses fewer API tokens by batching multiple questions into one request

b) Each Action returns real-world information (tool output) that updates the agent's context. The next Thought can incorporate this new information to plan the next step. This dynamic update cycle allows the agent to handle multi-step tasks where later steps depend on the results of earlier ones — impossible in a single-pass call.

c) The Thought steps run in parallel with Action steps for faster execution

d) ReAct avoids hallucination entirely by never generating text without tool confirmation

2. The calculate() tool uses AST parsing to whitelist safe nodes instead of eval() directly. The reason is:

a) eval() is deprecated in Python 3.11 — AST is the modern replacement

b) Direct eval("__import__('os').system('rm -rf /')") would execute arbitrary OS commands. An agent that passes user-controlled input to eval() is a code execution vulnerability — an attacker could prompt the agent to call calculate() with a malicious expression and gain full system access. AST whitelisting restricts execution to only safe mathematical operations.

c) AST parsing is 10x faster than eval() for mathematical expressions

d) eval() cannot handle Python's ** exponentiation operator

3. When the agent loop receives a response with no tool_calls, it should:

a) Retry the same request until a tool call is generated

b) Return the response as the final answer — the model has determined it has enough information to respond without further tool use. In the LangGraph formulation, this routes to the END node. Continuing to loop would cause unnecessary API calls and potential hallucination from over-thinking.

c) Prompt the user for clarification before proceeding

d) Switch to a different model (like GPT-3.5) for the final synthesis step

4. Vector memory (ChromaDB) retrieves past notes by semantic similarity rather than exact keyword match. This means:

a) It can only retrieve notes that contain the exact words in the query

b) A query for "what do I know about RAG?" can retrieve notes stored as "retrieval-augmented generation paper summary" — even with no word overlap — because both are mapped to nearby points in embedding space. This makes vector memory far more useful than a traditional key-value store for knowledge retrieval.

c) The retrieval is probabilistic and may return completely unrelated notes

d) Vector similarity search requires the query and stored text to be the same language

5. The agent loop includes for _ in range(10) as a safety limit. Without this limit:

a) The agent would crash Python due to stack overflow from recursive tool calls

b) A buggy agent (or adversarial input) could create an infinite loop — endlessly calling tools, accumulating API costs, and never returning. The step limit is a safety guardrail. In production, you also want a time-out and a maximum cost budget in addition to a step limit.

c) OpenAI's API automatically disconnects after 10 sequential calls

d) Without the limit, tool results would overflow the context window after 10 calls

6. LangGraph models agents as state machines with explicit nodes and edges. The key advantage over a simple while loop is:

a) LangGraph runs each node on a separate CPU thread for faster execution

b) The graph structure enables: (1) checkpointing — save/resume long-running agents, (2) human-in-the-loop — pause at any node for human review, (3) parallel branches — fan out to multiple tools simultaneously, (4) observable state — inspect exactly what state the agent is in at any point. None of these are trivial with a simple loop.

c) LangGraph automatically optimises tool call order for minimum API cost

d) LangGraph's type system prevents hallucinated tool names from being called

7. Episodic memory stores a log of past agent actions and outcomes. This enables a capability called:

a) Batch processing — the agent can process multiple tasks simultaneously

b) Learning from experience: if the agent previously tried a search strategy that failed, it can query its episodic memory at the start of a new task, find the failure pattern, and avoid it. This is a form of in-context learning over the agent's own history — enabling continuous improvement without retraining the underlying LLM.

c) The agent can predict what tools will be needed before the user asks

d) Episodic memory reduces latency by caching tool results from previous sessions

8. A research agent is given the task: "Summarise the 5 most cited papers on transformer attention from 2017–2020." The agent should NOT:

a) Use multiple tool calls to search, retrieve, and verify information

b) Generate the paper titles and citation counts from memory without using search tools. LLM training data may be incorrect about citation counts, paper existence, or author names. For factual research tasks, the agent must use tools to ground its claims — hallucinating research papers would be worse than admitting uncertainty.

c) Ask clarifying questions if the task requirements are ambiguous

d) Include its uncertainty when it cannot verify a specific claim

← Lesson 7: Multi-modal AI Lesson 9: Reading AI Papers →

AI Agents and Tool Use 🤖

Class 11 Lesson 8 - AI Agents and Tool Use

In-Context Memory

External / Vector Memory

Episodic Memory

🤖 Lesson 8 Quiz — AI Agents and Tool Use