7.9 KiB

Raw Blame History

MCP Image Recognition Server

An MCP (Model Context Protocol) server that provides AI-powered image analysis tools for AI assistants.

Features

describe_image: Analyze images from base64 encoded data using OpenAI's Vision API
describe_image_from_file: Analyze images from file paths using OpenAI's Vision API
Automatic fallback to basic metadata if OpenAI API is not configured
Automatic Kilocode configuration on installation
Portable and distributable via PyPI

Quick Installation (Recommended)

Install from PyPI (once published):

pip install image-recognition-mcp

The server will automatically configure itself in Kilocode during installation! 🎉

If automatic configuration doesn't work, you can manually run:

image-recognition-mcp-install

Local Development Setup

For local development or if you want to run from source:

cd /home/enne2/Sviluppo/tetris-sdl/mcp-image-server
./run.sh

The script will automatically:

✅ Create virtual environment if it doesn't exist
✅ Install dependencies if needed
✅ Activate the virtual environment
✅ Start the server

Configuration

After installation, you need to add your OpenAI API key:

Open Kilocode's MCP settings: ~/.config/VSCodium/User/globalStorage/kilocode.kilo-code/settings/mcp_settings.json
Find the image-recognition server entry
Replace "your-openai-api-key-here" with your actual OpenAI API key
Restart Kilocode

Available Tools

1. describe_image

Analyzes an image from base64 encoded data using OpenAI's GPT-4 Vision.

Parameters:

image_data (string, required): Base64 encoded image data
mime_type (string, optional): MIME type of the image (default: 'image/jpeg')

Returns: Detailed AI-generated description of the image including objects, colors, composition, and visible text

Fallback: If OpenAI API is not configured, returns basic image metadata (size, mode, format)

2. describe_image_from_file

Analyzes an image from a file path using OpenAI's GPT-4 Vision.

Parameters:

file_path (string, required): Path to the image file

Returns: Detailed AI-generated description of the image

Supported formats: JPEG, PNG, GIF, WebP (automatically detected from file extension)

3. ask_image_question

Ask a specific question about an image using AI vision.

Parameters:

file_path (string, required): Path to the image file
prompt (string, required): The question or instruction about the image

Returns: AI response to the specific question about the image

Example usage: "What color is the car in this image?", "How many people are in this photo?", "What text is visible in this image?"

4. generate_image_dalle

Generate images using OpenAI's DALL-E API.

Parameters:

prompt (string, required): Description of the image to generate
size (string, optional): Image size - options: "1024x1024", "1792x1024", "1024x1792" (default: "1024x1024")
quality (string, optional): Image quality - options: "standard", "hd" (default: "standard")
style (string, optional): Image style - options: "vivid", "natural" (default: "vivid")
n (integer, optional): Number of images to generate (1-10, default: 1)

Returns: Generated image URLs and metadata

Example prompts: "A futuristic city skyline at sunset", "A cute robot playing with a cat", "Abstract art with blue and gold colors"

Example Usage

Once configured in Kilocode with a valid OpenAI API key:

Image Analysis:

Can you analyze the image at /path/to/image.jpg?

Ask Specific Questions:

What color is the car in /path/to/photo.jpg?
How many people are visible in /path/to/group_photo.png?
What text can you read in /path/to/document.jpg?

Generate Images:

Generate an image: "A peaceful mountain landscape at sunrise"
Create a high-quality image of "A futuristic robot in a cyberpunk city" in 1792x1024 size

The AI will use the appropriate tools (describe_image_from_file, ask_image_question, or generate_image_dalle) to provide detailed responses.

Installation Methods

Method 1: PyPI (Recommended - Once Published)

pip install image-recognition-mcp

Automatically configures Kilocode! ✨

Method 2: From Source

git clone https://github.com/yourusername/image-recognition-mcp.git
cd image-recognition-mcp
pip install -e .

Method 3: Using uvx (Portable)

uvx image-recognition-mcp

No installation needed! Works like npx for Python.

Kilocode Configuration

The server automatically adds this configuration:

{
  "mcpServers": {
    "image-recognition": {
      "command": "uvx",
      "args": ["image-recognition-mcp"],
      "env": {
        "OPENAI_API_KEY": "your-openai-api-key-here"
      }
    }
  }
}

Files Structure

mcp-image-server/
├── run.sh                          # Local startup script
├── requirements.txt                # Python dependencies
├── setup.py                        # Package setup (with auto-config)
├── pyproject.toml                  # Modern Python packaging
├── README.md                       # This file
├── PUBLISHING.md                   # Publishing guide
├── LICENSE                         # MIT License
├── MANIFEST.in                     # Package manifest
├── image_server.log               # Server logs
├── venv/                          # Virtual environment (auto-created)
└── image_recognition_server/
    ├── __init__.py
    ├── server.py                  # Main server implementation
    └── install.py                 # Auto-configuration script

Commands

After installation, these commands are available:

image-recognition-mcp - Start the MCP server
image-recognition-mcp-install - Configure Kilocode (runs automatically on install)

Dependencies

fastmcp: FastMCP framework for building MCP servers
pillow: Python Imaging Library for image processing
openai: OpenAI API client for Vision API

Logs

Server logs are written to: /home/enne2/Sviluppo/tetris-sdl/mcp-image-server/image_server.log (local)

Or when installed via pip: ~/.local/share/image-recognition-mcp/logs/ (system-wide)

How It Works

With OpenAI API Key:
- Images are encoded to base64
- Sent to OpenAI's GPT-4o-mini Vision model
- Returns detailed AI-generated descriptions
Without OpenAI API Key:
- Falls back to basic image metadata
- Returns size, color mode, and format information
- Includes a note about configuring the API key

Troubleshooting

Server won't start

Check that Python 3.8+ is installed: python3 --version
Verify installation: pip show image-recognition-mcp
Check logs for errors

Automatic configuration failed

Run manually: image-recognition-mcp-install
Or configure manually (see PUBLISHING.md)

No AI descriptions

Verify your OpenAI API key is correctly set in MCP settings
Check that the key is valid and has credits
Review logs for API errors
The server will show a warning on startup if no valid API key is detected

Image not found

Ensure the file path is absolute
Check file permissions
Verify the file exists: ls -la /path/to/image.jpg

Development

To modify the server:

Clone the repository
Install in development mode: pip install -e .
Make changes to image_recognition_server/server.py
Test locally: image-recognition-mcp

Publishing

See PUBLISHING.md for instructions on publishing to PyPI.

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Future Enhancements

Support for batch image processing
Image comparison tools
Custom vision models
Image generation capabilities
Support for more image formats
Caching for repeated image analyses
Web interface for testing

7.9 KiB Raw Blame History

MCP Image Recognition Server

Features

Quick Installation (Recommended)

Local Development Setup

Configuration

Available Tools

1. describe_image

2. describe_image_from_file

3. ask_image_question

4. generate_image_dalle

Example Usage

Installation Methods

Method 1: PyPI (Recommended - Once Published)

Method 2: From Source

Method 3: Using uvx (Portable)

Kilocode Configuration

Files Structure

Commands

Dependencies

Logs

How It Works

Troubleshooting

Server won't start

Automatic configuration failed

No AI descriptions

Image not found

Development

Publishing

License

Contributing

Future Enhancements

7.9 KiB

Raw Blame History