6.3 KiB

Raw Blame History

MCP Image Recognition Server

An MCP (Model Context Protocol) server that provides AI-powered image analysis tools for AI assistants.

Features

describe_image: Analyze images from base64 encoded data using OpenAI's Vision API
describe_image_from_file: Analyze images from file paths using OpenAI's Vision API
Automatic fallback to basic metadata if OpenAI API is not configured
Automatic Kilocode configuration on installation
Portable and distributable via PyPI

Quick Installation (Recommended)

Install from PyPI (once published):

pip install image-recognition-mcp

The server will automatically configure itself in Kilocode during installation! 🎉

If automatic configuration doesn't work, you can manually run:

image-recognition-mcp-install

Local Development Setup

For local development or if you want to run from source:

cd /home/enne2/Sviluppo/tetris-sdl/mcp-image-server
./run.sh

The script will automatically:

✅ Create virtual environment if it doesn't exist
✅ Install dependencies if needed
✅ Activate the virtual environment
✅ Start the server

Configuration

After installation, you need to add your OpenAI API key:

Open Kilocode's MCP settings: ~/.config/VSCodium/User/globalStorage/kilocode.kilo-code/settings/mcp_settings.json
Find the image-recognition server entry
Replace "your-openai-api-key-here" with your actual OpenAI API key
Restart Kilocode

Available Tools

1. describe_image

Analyzes an image from base64 encoded data using OpenAI's GPT-4 Vision.

Parameters:

image_data (string, required): Base64 encoded image data
mime_type (string, optional): MIME type of the image (default: 'image/jpeg')

Returns: Detailed AI-generated description of the image including objects, colors, composition, and visible text

Fallback: If OpenAI API is not configured, returns basic image metadata (size, mode, format)

2. describe_image_from_file

Analyzes an image from a file path using OpenAI's GPT-4 Vision.

Parameters:

file_path (string, required): Path to the image file

Returns: Detailed AI-generated description of the image

Supported formats: JPEG, PNG, GIF, WebP (automatically detected from file extension)

Example Usage

Once configured in Kilocode with a valid OpenAI API key:

Can you analyze the image at /path/to/image.jpg?

The AI will use the describe_image_from_file tool to provide a detailed description.

Installation Methods

Method 1: PyPI (Recommended - Once Published)

pip install image-recognition-mcp

Automatically configures Kilocode! ✨

Method 2: From Source

git clone https://github.com/yourusername/image-recognition-mcp.git
cd image-recognition-mcp
pip install -e .

Method 3: Using uvx (Portable)

uvx image-recognition-mcp

No installation needed! Works like npx for Python.

Kilocode Configuration

The server automatically adds this configuration:

{
  "mcpServers": {
    "image-recognition": {
      "command": "uvx",
      "args": ["image-recognition-mcp"],
      "env": {
        "OPENAI_API_KEY": "your-openai-api-key-here"
      }
    }
  }
}

Files Structure

mcp-image-server/
├── run.sh                          # Local startup script
├── requirements.txt                # Python dependencies
├── setup.py                        # Package setup (with auto-config)
├── pyproject.toml                  # Modern Python packaging
├── README.md                       # This file
├── PUBLISHING.md                   # Publishing guide
├── LICENSE                         # MIT License
├── MANIFEST.in                     # Package manifest
├── image_server.log               # Server logs
├── venv/                          # Virtual environment (auto-created)
└── image_recognition_server/
    ├── __init__.py
    ├── server.py                  # Main server implementation
    └── install.py                 # Auto-configuration script

Commands

After installation, these commands are available:

image-recognition-mcp - Start the MCP server
image-recognition-mcp-install - Configure Kilocode (runs automatically on install)

Dependencies

fastmcp: FastMCP framework for building MCP servers
pillow: Python Imaging Library for image processing
openai: OpenAI API client for Vision API

Logs

Server logs are written to: /home/enne2/Sviluppo/tetris-sdl/mcp-image-server/image_server.log (local)

Or when installed via pip: ~/.local/share/image-recognition-mcp/logs/ (system-wide)

How It Works

With OpenAI API Key:
- Images are encoded to base64
- Sent to OpenAI's GPT-4o-mini Vision model
- Returns detailed AI-generated descriptions
Without OpenAI API Key:
- Falls back to basic image metadata
- Returns size, color mode, and format information
- Includes a note about configuring the API key

Troubleshooting

Server won't start

Check that Python 3.8+ is installed: python3 --version
Verify installation: pip show image-recognition-mcp
Check logs for errors

Automatic configuration failed

Run manually: image-recognition-mcp-install
Or configure manually (see PUBLISHING.md)

No AI descriptions

Verify your OpenAI API key is correctly set in MCP settings
Check that the key is valid and has credits
Review logs for API errors
The server will show a warning on startup if no valid API key is detected

Image not found

Ensure the file path is absolute
Check file permissions
Verify the file exists: ls -la /path/to/image.jpg

Development

To modify the server:

Clone the repository
Install in development mode: pip install -e .
Make changes to image_recognition_server/server.py
Test locally: image-recognition-mcp

Publishing

See PUBLISHING.md for instructions on publishing to PyPI.

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Future Enhancements

Support for batch image processing
Image comparison tools
Custom vision models
Image generation capabilities
Support for more image formats
Caching for repeated image analyses
Web interface for testing

6.3 KiB Raw Blame History