Building a
Chatbot
1. Your First API Call
Every chatbot starts with a single API call. Here's the minimal working version with Claude:
first_call.pyimport anthropic client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, messages=[ {"role": "user", "content": "What is gradient descent?"} ] ) print(response.content[0].text)
The messages array is stateless. Each API call is independent — the model has no memory of previous calls unless you explicitly include prior messages in the array.
2. Conversation Memory
To build a chatbot that remembers what was said, you must accumulate the message history and pass it with every request. This is the messages pattern.
chatbot_memory.pyimport anthropic client = anthropic.Anthropic() history = [] # grows with each turn def chat(user_input): history.append({"role": "user", "content": user_input}) response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, system="You are a concise AI tutor.", messages=history # full history every time ) assistant_reply = response.content[0].text history.append({"role": "assistant", "content": assistant_reply}) return assistant_reply # Usage print(chat("What is a neural network?")) print(chat("Can you give me an example?")) # remembers the context
Managing context window limits
Context windows are finite. As conversations grow, you'll hit limits or pay more. Common strategies:
3. Tool Use (Function Calling)
Tool use lets the model call your functions — search the web, query a database, send an email. You define the tools; the model decides when to use them.
tool_use.py# 1. Define tools tools = [{ "name": "get_weather", "description": "Get current weather for a city", "input_schema": { "type": "object", "properties": { "city": {"type": "string", "description": "City name"} }, "required": ["city"] } }] # 2. Send to model response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, tools=tools, messages=[{"role": "user", "content": "What's the weather in Tokyo?"}] ) # 3. Check if model wants to call a tool if response.stop_reason == "tool_use": tool_call = response.content[1] # ToolUseBlock city = tool_call.input["city"] # 4. Execute the real function weather_data = fetch_weather_api(city) # 5. Send result back final = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, tools=tools, messages=[ {"role": "user", "content": "What's the weather in Tokyo?"}, {"role": "assistant", "content": response.content}, {"role": "user", "content": [{ "type": "tool_result", "tool_use_id": tool_call.id, "content": str(weather_data) }]} ] )
Tool safety: always validate and sanitize tool inputs before executing them. The model can be prompted into calling tools with malicious inputs — treat tool calls like untrusted user input.
4. Streaming Responses
Streaming sends tokens to the client as they're generated. This dramatically improves perceived latency — users see output start immediately rather than waiting for the full response.
streaming.pywith client.messages.stream( model="claude-sonnet-4-6", max_tokens=1024, messages=[{"role": "user", "content": "Explain transformers in detail"}] ) as stream: for text in stream.text_stream: print(text, end="", flush=True)
5. Production Checklist
Throttle per-user to prevent runaway API costs. Store usage in Redis or a DB.
Wrap API calls with exponential backoff. Handle 429 rate limit errors gracefully.
Limit message length, strip control characters, reject obviously malicious inputs.
Log every request with token counts and latency. You need this for debugging and cost attribution.
Store in a database (Postgres, DynamoDB). In-memory history dies on server restart.
6. Full Chatbot Example
A production-ready FastAPI chatbot with streaming, memory, and error handling:
chatbot_api.pyfrom fastapi import FastAPI from fastapi.responses import StreamingResponse import anthropic, json app = FastAPI() client = anthropic.Anthropic() sessions = {} # session_id -> message history @app.post("/chat/{session_id}") async def chat(session_id: str, body: dict): user_msg = body["message"][:2000] # cap input length history = sessions.setdefault(session_id, []) history.append({"role": "user", "content": user_msg}) async def generate(): full_text = "" try: with client.messages.stream( model="claude-sonnet-4-6", max_tokens=1024, system="You are a helpful AI assistant.", messages=history[-20:] # sliding window ) as stream: for text in stream.text_stream: full_text += text yield json.dumps({"delta": text}) + "\n" except anthropic.RateLimitError: yield json.dumps({"error": "Rate limit hit, please wait."}) + "\n" return history.append({"role": "assistant", "content": full_text}) return StreamingResponse(generate(), media_type="application/x-ndjson")
7. Knowledge Check
5 questions · pick the best answer
8. What's Next
Your chatbot is live. Next: build an image generation pipeline with Stable Diffusion.