Build a Simple MCP Agent with OpenAI: Describe and OCR Local Images

If you want a practical MCP example that is easy to run and easy to explain, image analysis is a great fit.


Source code: JordiCorbilla/mcp-image-analysis-agent: mcp-image-analysis-agent




In this walkthrough, we will build a minimal Model Context Protocol (MCP) setup:

  • An MCP server that exposes tools to list images, download files, and analyse local images
  • An MCP client that uses OpenAI tool-calling to decide when to call that tool
  • A CLI flow that returns both:
    • image description
    • extracted text (OCR)

This example is intentionally small so you can publish it, teach it, and extend it quickly.

What You Will Build

You will run two Python programs:

  • server.py: MCP server with practical tools for image workflows
  • client.py: OpenAI-powered agentic client that connects to the server and can invoke MCP tools

The interesting part is the split of responsibilities:

  1. The client handles conversation, model calls, and tool orchestration.
  2. The server handles local file access safely and image analysis logic.

Why This Is a Good MCP Example

MCP is most useful when a model needs controlled access to local capabilities. In this project:

  • The model cannot directly read files.
  • The model must call a tool exposed by your MCP server.
  • The server validates paths and performs the action.

That makes this architecture both practical and safer than giving direct unrestricted local access.

Prerequisites

  • Python 3.10+
  • OpenAI API key
  • A local image file in the lab folder (for example sample.png)

Project Files

  • client.py
  • server.py
  • requirements.txt

Install Dependencies

From the lab folder:

python -m pip install -r requirements.txt

Set Environment Variables

Use a .env file in the same folder:

OPENAI_API_KEY=sk-your-key-here
OPENAI_MODEL=gpt-4.1-mini
# Optional override for server-side vision model
OPENAI_VISION_MODEL=gpt-4.1-mini

Run the Demo

Start the client and point it to the server:

python .\client.py .\server.py

From the menu:

  1. Choose Analyse Local Image (Description + OCR)
  2. Enter an image path relative to the project folder
  3. Optionally add a focus instruction (for example: extract receipt totals)

How the Agentic Loop Works

The client does this repeatedly:

  1. Sends user query + available MCP tools to OpenAI.
  2. If model requests tool calls, client executes them through MCP.
  3. Tool results are appended back into the conversation.
  4. Model returns final answer.

That means the model decides when to use tools, while your server decides how tools operate.

Core MCP Tools

The server exposes these core tools:

  • list_images_in_current_folder()
  • download_file_from_url(url, output_path="")
  • analyze_local_image(file_path, focus="")

1) List Images in the Current Folder

Use this when you want the agent to discover available images before analysis.

Example request:

List images available in this folder.

Returned data includes:

  • count
  • images[] with name, relative path, size, and modified timestamp

2) Download a File from a URL

Use this to bring an image into your project folder before analysis.

Example request:

Download https://example.com/photo.jpg to downloads/photo.jpg.

The tool restricts downloads to http/https and saves only inside your project directory.

3) Analyse a Local Image

This performs description + OCR in one pass.

Behaviour:

  1. Validates the path stays inside the project directory.
  2. Checks that the file exists and is a supported image format.
  3. Converts image bytes to a data URL.
  4. Sends image + instruction to OpenAI Vision.
  5. Returns structured output:
    • description
    • extracted_text
    • notable_details

Example Prompt

User asks:

Analyze image sample.png. Use MCP tools. Return a short description, extracted text, and two useful follow-up actions.

You can also run a
two-step flow:

  1. Ask the agent to list images with list_images_in_current_folder.
  2. Pick one and ask the agent to analyse it with analyze_local_image.

Possible response:

  • Description: A photographed whiteboard with sprint tasks and dates.
  • Extracted text: "Sprint 12, Demo Friday, Fix auth timeout"
  • Follow-up actions:
    • Convert items into a task checklist.
    • Create calendar reminders for deadline lines.

Error Handling Included

This sample handles common failures cleanly:

  • Missing OPENAI_API_KEY
  • Invalid image path
  • Path traversal attempts outside project folder
  • Unsupported image extension
  • Non-JSON model output fallback

Security Notes

Even in a demo, do not skip path validation.

The server uses a safe resolver that enforces all tool file access under the project root. This prevents prompts from tricking your tool into reading arbitrary files outside the workspace.

Cost and Model Tips

  • Start with gpt-4.1-mini for lower cost.
  • For higher OCR fidelity, test a stronger model in OPENAI_VISION_MODEL.
  • Keep outputs concise to reduce tokens.

Troubleshooting

OPENAI_API_KEY is not set

Add the key to .env or set it in your shell, then restart.

Image file not found

Use a path relative to the project folder where you run the client.

Unsupported image extension

Use one of: .png.jpg.jpeg.webp.gif.bmp.tif.tiff.

No OCR text returned

Some images contain no readable text or low-quality text. Try a clearer image or add focus guidance.

Extension Ideas

Once this baseline works, you can add:

  1. Batch image processing folder mode.
  2. Structured OCR outputs (fields for receipts/invoices/forms).
  3. A second tool to summarize OCR into action items.
  4. Markdown report generation from analysis results.

Final Thoughts

This project is a compact, publishable MCP example that demonstrates real agent behaviour:

  • OpenAI handles reasoning and tool decisions.
  • MCP exposes local capabilities safely.
  • You get immediately useful output from local images.

If you are teaching MCP, this is a strong first demo before moving on to multi-tool, multi-agent workflows.

Comments