Build a Simple MCP Agent with OpenAI: Describe and OCR Local Images
If you want a practical MCP example that is easy to run and easy to explain, image analysis is a great fit.
Source code: JordiCorbilla/mcp-image-analysis-agent: mcp-image-analysis-agent
In this walkthrough, we will build a minimal Model Context Protocol (MCP) setup:
- An MCP server that exposes tools to list images, download files, and analyse local images
- An MCP client that uses OpenAI tool-calling to decide when to call that tool
- A CLI flow that returns both:
- image description
- extracted text (OCR)
This example is intentionally small so you can publish it, teach it, and extend it quickly.
What You Will Build
You will run two Python programs:
server.py: MCP server with practical tools for image workflowsclient.py: OpenAI-powered agentic client that connects to the server and can invoke MCP tools
The interesting part is the split of responsibilities:
- The client handles conversation, model calls, and tool orchestration.
- The server handles local file access safely and image analysis logic.
Why This Is a Good MCP Example
MCP is most useful when a model needs controlled access to local capabilities. In this project:
- The model cannot directly read files.
- The model must call a tool exposed by your MCP server.
- The server validates paths and performs the action.
That makes this architecture both practical and safer than giving direct unrestricted local access.
Prerequisites
- Python 3.10+
- OpenAI API key
- A local image file in the lab folder (for example
sample.png)
Project Files
client.pyserver.pyrequirements.txt
Install Dependencies
From the lab folder:
python -m pip install -r requirements.txt
Set Environment Variables
Use a .env file in the same folder:
OPENAI_API_KEY=sk-your-key-here
OPENAI_MODEL=gpt-4.1-mini
# Optional override for server-side vision model
OPENAI_VISION_MODEL=gpt-4.1-mini
Run the Demo
Start the client and point it to the server:
python .\client.py .\server.py
From the menu:
- Choose
Analyse Local Image (Description + OCR) - Enter an image path relative to the project folder
- Optionally add a focus instruction (for example:
extract receipt totals)
How the Agentic Loop Works
The client does this repeatedly:
- Sends user query + available MCP tools to OpenAI.
- If model requests tool calls, client executes them through MCP.
- Tool results are appended back into the conversation.
- Model returns final answer.
That means the model decides when to use tools, while your server decides how tools operate.
Core MCP Tools
The server exposes these core tools:
list_images_in_current_folder()download_file_from_url(url, output_path="")analyze_local_image(file_path, focus="")
1) List Images in the Current Folder
Use this when you want the agent to discover available images before analysis.
Example request:
List images available in this folder.
Returned data includes:
countimages[]with name, relative path, size, and modified timestamp
2) Download a File from a URL
Use this to bring an image into your project folder before analysis.
Example request:
Download
https://example.com/photo.jpgtodownloads/photo.jpg.
The tool restricts downloads to http/https and saves only inside your project directory.
3) Analyse a Local Image
This performs description + OCR in one pass.
Behaviour:
- Validates the path stays inside the project directory.
- Checks that the file exists and is a supported image format.
- Converts image bytes to a data URL.
- Sends image + instruction to OpenAI Vision.
- Returns structured output:
descriptionextracted_textnotable_details
Example Prompt
User asks:
Analyze image
sample.png. Use MCP tools. Return a short description, extracted text, and two useful follow-up actions.
You can also run a
two-step flow:
- Ask the agent to list images with
list_images_in_current_folder. - Pick one and ask the agent to analyse it with
analyze_local_image.
Possible response:
- Description: A photographed whiteboard with sprint tasks and dates.
- Extracted text: "Sprint 12, Demo Friday, Fix auth timeout"
- Follow-up actions:
- Convert items into a task checklist.
- Create calendar reminders for deadline lines.
Error Handling Included
This sample handles common failures cleanly:
- Missing
OPENAI_API_KEY - Invalid image path
- Path traversal attempts outside project folder
- Unsupported image extension
- Non-JSON model output fallback
Security Notes
Even in a demo, do not skip path validation.
The server uses a safe resolver that enforces all tool file access under the project root. This prevents prompts from tricking your tool into reading arbitrary files outside the workspace.
Cost and Model Tips
- Start with
gpt-4.1-minifor lower cost. - For higher OCR fidelity, test a stronger model in
OPENAI_VISION_MODEL. - Keep outputs concise to reduce tokens.
Troubleshooting
OPENAI_API_KEY is not set
Add the key to .env or set it in your shell, then restart.
Image file not found
Use a path relative to the project folder where you run the client.
Unsupported image extension
Use one of: .png, .jpg, .jpeg, .webp, .gif, .bmp, .tif, .tiff.
No OCR text returned
Some images contain no readable text or low-quality text. Try a clearer image or add focus guidance.
Extension Ideas
Once this baseline works, you can add:
- Batch image processing folder mode.
- Structured OCR outputs (fields for receipts/invoices/forms).
- A second tool to summarize OCR into action items.
- Markdown report generation from analysis results.
Final Thoughts
This project is a compact, publishable MCP example that demonstrates real agent behaviour:
- OpenAI handles reasoning and tool decisions.
- MCP exposes local capabilities safely.
- You get immediately useful output from local images.
If you are teaching MCP, this is a strong first demo before moving on to multi-tool, multi-agent workflows.

Comments
Post a Comment