The Why
... or rather Why not use something like LangChain?
I find frameworks like LangChain makes building Agentic LLMs "feel" a lot more complicated than it actually is. In order to understand the agentic underpinnings - tool calling, context and agent loops better, we must forego these frameworks and build these primitives ourselves.
In this post, we will build Trivi-Al, a tiny chatbot that can ingest a document of your choosing and answer questions about it.
What you will need
- Python 3.x
- An OpenAI API Key
Setting up
1. Creating a virtual environment
From the project folder, run:
python -m venv venv
Then activate it:
source venv/bin/activate
You should now see something like (venv) at the start of your terminal prompt. This means any packages you install will go into this project environment.
2. Installing dependencies
Create a requirements.txt file and add these packages:
openai
python-dotenv
pytest
Here is what each package is for:
openailets us call the LLMpython-dotenvlets us load secrets from a.envfilepytestlets us write a few tests as the project grows
Now install the dependencies:
python -m pip install -r requirements.txt
3. Setting the API key
Create a .env file in the project root:
OPENAI_API_KEY=your_key_here
The load_dotenv() call in main.py will load this automatically. We will get to this later.
Prompts
Create the prompt as a markdown file. This keeps the prompt easy to read and edit without touching the code.
Create a file called prompts/system_prompt.md:
Answer the user's question using only information retrieved from the indexed document.
Return only valid JSON.
Do not wrap it in markdown.
Do not include any explanation outside the JSON.
Your response must contain role, content, and tool_call.
Use role='assistant'.
Use content for the natural-language answer to the user.
Before answering any document-related question, you must request exactly one DocStore tool call.
Instructions for searching the document:
1. Extract exactly 1 keyword from the user's query that can be used for search.
2. Use search_document to retrieve k related chunks.
3. Answer based on these retrieved chunks.
Request the tool call by setting tool_call.tool_name to the tool name and tool_call.args to a JSON object containing the tool arguments.
Do not answer document-related questions from memory or prior knowledge.
Before receiving a tool result, leave content empty when requesting a tool call.
When the conversation contains a message beginning with 'Tool result from', use that tool result to answer the user's original question.
After receiving a tool result, answer only from the tool result.
If the tool result does not contain enough information, say that the indexed document does not contain enough information.
Do not request another tool call after receiving a tool result.
After receiving a tool result, set tool_call.tool_name to an empty string and tool_call.args to an empty object.
The JSON must match this schema:
{{ response_format }}
Tools:
{{ tools_registry }}
This prompt essentially lays out the 'protocol' to be used by the LLM when communicating with the user as well as the rest of the code we are about to write.
Prompt Management
Next, create src/prompt_manager.py which allows us to render our prompts dynamically at runtime.
from __future__ import annotations
from pathlib import Path
class PromptManager:
def __init__(self, template: str):
self.template = template
@classmethod
def from_file(cls, path: str | Path):
return cls(Path(path).read_text())
def render(self, **variables: str) -> str:
rendered = self.template
for key, value in variables.items():
rendered = rendered.replace(f"{{{{ {key} }}}}", value)
return rendered
The render method allows us to do variable interpolation at runtime like shown below.
>>> sample_template="Hello {{ name }}"
>>> pm = PromptManager(sample_template)
>>> pm.render(name="Bob")
'Hello Bob'
Testing the PromptManager
Let us write a test to ensure that prompts can be rendered correctly.
1. Writing the tests
Create tests/test_prompt_manager.py:
import pytest
from src.prompt_manager import PromptManager
@pytest.fixture
def sample_template():
return """Replace {{ this }} with {{ that }}"""
def test_render(sample_template):
pm = PromptManager(template=sample_template)
rendered = pm.render(this="that", that="this")
assert rendered == """Replace that with this"""
2. Running the tests
python -m pytest -v
Retrieval - building a simple DocStore
Out in the wild, most LLMs retrieve stuff from a VectorDB, but for our requirement we will build something basic - a simple Document store that allows us to index a document and search within it.
Create a file called src/doc_store.py:
class DocStore:
def __init__(self):
self.chunks = []
def index(self, filepath: str) -> int:
with open(filepath, "r") as f:
content = f.read()
self.chunks = content.lower().split("\n\n")
return len(self.chunks)
def search(self, query: str, k: int = 5) -> list[str]:
matches = []
terms = query.lower().split()
for chunk in self.chunks:
if any(term in chunk for term in terms):
matches.append(chunk)
return matches[:k]
The DocStore provides methods:
indexmethod chunks the document into paragraphs.searchdoes a word based search on the chunks and returns the ones that match
Testing the DocStore
As usual, we write a few tests to ensure everything works as expected. We will test against a real file so our assertions are grounded in known content.
1. Creating a test fixture
Create data/elephants.txt with the following content.
The African elephant is the largest land animal on Earth.
Adult males can weigh up to 6,350 kilograms and stand 3 to 4 meters tall at the shoulder.
Elephants have large ears that help them regulate body temperature in hot climates.
Their trunks contain over 40,000 muscles and can lift objects weighing up to 350 kilograms.
Elephants are herbivores that consume between 150 to 300 kilograms of food per day.
They spend up to 16 hours daily eating grasses, leaves, bark, and fruit.
Due to their massive size, elephants require vast amounts of water and can drink up to 190 liters in a single day.
African Elephants live in matriarchal family groups led by the oldest female.
These herds typically consist of related females and their offspring.
Male elephants leave the herd when they reach puberty and either live alone or form loose bachelor groups.
Elephants communicate using low-frequency sounds called infrasound that travel several kilometers.
They also use body language, touch, and scent signals to communicate with each other.
Their exceptional memory helps them remember water sources and recognize other elephants after years of separation.
This gives us 4 chunks (4 paragraphs), with "elephant" in every chunk and "matriarchal" in exactly one — useful for precise assertions.
2. Writing the tests
Create tests/test_doc_store.py:
import pytest
from src.doc_store import DocStore
@pytest.fixture
def db():
db = DocStore()
db.index("./data/elephants.txt")
return db
def test_index():
db = DocStore()
# elephants.txt has 4 paragraphs
assert db.index("./data/elephants.txt") == 4
assert len(db.chunks) == 4
def test_search(db):
# "matriarchal" appears in exactly 1 paragraph
search_results = db.search(query="matriarchal", k=3)
assert len(search_results) == 1
# search is case-insensitive
search_results = db.search(query="Matriarchal", k=3)
assert len(search_results) == 1
# "elephant" appears in all 4 paragraphs but k=1 should limit it to 1
search_results = db.search(query="elephant", k=1)
assert len(search_results) == 1
search_results = db.search(query="nonexistent", k=3)
assert len(search_results) == 0
# multi-term query matches chunks containing any term
search_results = db.search(query="elephant matriarchal", k=10)
assert len(search_results) == 4
3. Running the tests
python -m pytest -v
You should see all tests pass.
Tools
Create src/tools.py to define the available tool and its signature:
from src.doc_store import DocStore
TOOLS_REGISTRY = {
"search_document": {"query": "", "k": 5},
}
def search_document(store: DocStore, query: str, k: int) -> list[str]:
"""Return top-k chunks that are similar to query"""
return store.search(query, k)
TOOLS_REGISTRY describes the tools and their expected arguments. This gets injected into the system prompt so the LLM knows what it can ask for.
LLM Client
Next we create a src/llm_client.py to handle all communication with the LLM:
import json
from dataclasses import dataclass
from typing import Any, Literal
from openai import OpenAI
@dataclass
class Message:
role: Literal["user", "assistant"]
content: str
tool_call: dict[str, Any] | None = None
class LLMClient:
def __init__(self, client: OpenAI, system_prompt: str):
self.client = client
self.system_prompt = system_prompt
def invoke(self, conversation: list[Message]) -> Message:
messages = [{"role": "system", "content": self.system_prompt}]
for message in conversation:
messages.append({"role": message.role, "content": message.content})
response = self.client.chat.completions.create(
model="gpt-4o",
messages=messages,
response_format={"type": "json_object"},
)
result = response.choices[0].message.content
try:
parsed_result = json.loads(result)
except json.JSONDecodeError as err:
raise ValueError(f"Expected valid JSON from LLM, got: {result}") from err
return Message(
role=parsed_result["role"],
content=parsed_result["content"],
tool_call=parsed_result.get("tool_call"),
)
There are a few things to pay attention to here.
The Message dataclass encapsulates each conversation turn.
- Both the user as well as the assistant sends
Message(s)back and forth by setting the appropriateroleattribute. - It also contains a
tool_callattribute that the assistant can use to express an intention to use a tool.
The invoke method generates a response from the LLM.
Note that the LLM receives 3 things as part of its instructions:
- The
modelto use to generate the response. - The entire "history" of
Message(s)from previous conversation turns - The response format
The LLMClient formats the conversation, calls the API, and parses the JSON response back into a Message ready for the next conversation turn.
Agentic Loop
Now we put it all together in main.py.
1. Imports
Let's add all the imports first and get it out of the way.
# main.py
import json
from pathlib import Path
from dotenv import load_dotenv
from openai import OpenAI
from src.prompt_manager import PromptManager
from src.tools import TOOLS_REGISTRY
from src.doc_store import DocStore
from src.llm_client import LLMClient, Message
from src.tools import search_document
2. System Prompt
Create the System Prompt by combining the prompt, response format and tools registry that we defined earlier.
# main.py
RESPONSE_FORMAT = {
"role": "assistant",
"content": "",
"tool_call": {"tool_name": "", "args": {}},
}
PROMPT_FILE = Path(__file__).parent / "prompts" / "system_prompt.md"
system_prompt = PromptManager.from_file(PROMPT_FILE).render(
response_format=json.dumps(RESPONSE_FORMAT),
tools_registry=json.dumps(TOOLS_REGISTRY),
)
3. The LLM Wrapper
Next, we initialize DocStore and the LLM client.
The load_dotenv method will load the OPENAI_API_KEY from the .env file
# main.py
docs = DocStore()
_ = docs.index("data/elephants.txt")
_ = load_dotenv()
client = OpenAI()
llm = LLMClient(client, system_prompt)
4. handle_turn
This is where we create two important "abilities" of our chatbot: Memory and Tool Calling
The handle_turn method handles one "full" turn of the conversation:
- it appends the user message to the
conversationlist which serves as our agent memory - it calls the LLM, runs any requested tool, then calls the LLM again with the result.
- The
conversationlist is passed by reference so every turn accumulates messages in the caller's list — giving the LLM full history on each call.
# main.py
def handle_turn(
query: str,
conversation: list[Message],
llm_client: LLMClient,
doc_store: DocStore,
) -> Message:
conversation.append(Message(role="user", content=query))
ai_msg = llm_client.invoke(conversation)
conversation.append(ai_msg)
if ai_msg.tool_call and ai_msg.tool_call.get("tool_name"):
tool_name = ai_msg.tool_call["tool_name"]
tool_args = ai_msg.tool_call.get("args", {})
if tool_name == "search_document":
tool_result = search_document(doc_store, **tool_args)
else:
tool_result = f"Unknown tool: {tool_name}"
tool_msg = Message(role="user", content=f"Tool result from {tool_name}: {tool_result}")
conversation.append(tool_msg)
final_msg = llm_client.invoke(conversation)
conversation.append(final_msg)
return final_msg
return ai_msg
5. Agent Loop
Our Agent Loop is a simple while loop.
# main.py
conversation = []
while True:
user_input = input("User: ")
if user_input.lower() == "exit":
break
response = handle_turn(user_input, conversation, llm, docs)
print(response.content)
Voila! We have a working chatbot.
Running the App
python main.py
Type exit to quit.
Here is Trivi-Al interacting with Al who likes to know about Elephants
User: Hi, my name is Al
Hello Al! How can I assist you today?
User: How strong are elephants?
Elephants are incredibly strong animals. Their trunks contain over 40,000 muscles and are capable of lifting objects weighing up to 350 kilograms. Adult male African elephants can weigh up to 6,350 kilograms, highlighting their massive build and strength.
User: What is my name, I seem to have forgotten.
Your name is Al.
User: exit
What We Built
In this post we built an LLM agent from scratch — no LangChain, no agent frameworks, just Python.
Let's do a quick recap of the various pieces we built:
- DocStore — a retrieval system that chunks a document and searches it using keyword matching, giving the LLM grounded context instead of relying on its training data.
- PromptManager — a lightweight template engine that injects the tool registry and response schema into the system prompt at runtime.
- Custom tool-calling protocol — rather than using OpenAI's built-in function calling, we defined our own JSON schema so the LLM can express tool requests. This makes the protocol explicit and easy to inspect.
- Agentic loop — a
handle_turnfunction that gives the agent memory (the conversation list) and the ability to act (fetch context, then answer).
The result is a chatbot that can answer questions about any document you give it, remember the conversation, and tell you when it doesn't know something.
Hopefully, this exercise has given you a deeper understanding of how these Agents actually work under the hood.
If you want to find a fuller implementation of Trivi-Al, check out the repo.