You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

7.9 KiB

MCP Image Recognition Server

An MCP (Model Context Protocol) server that provides AI-powered image analysis tools for AI assistants.

Features

  • describe_image: Analyze images from base64 encoded data using OpenAI's Vision API
  • describe_image_from_file: Analyze images from file paths using OpenAI's Vision API
  • Automatic fallback to basic metadata if OpenAI API is not configured
  • Automatic Kilocode configuration on installation
  • Portable and distributable via PyPI

Install from PyPI (once published):

pip install image-recognition-mcp

The server will automatically configure itself in Kilocode during installation! 🎉

If automatic configuration doesn't work, you can manually run:

image-recognition-mcp-install

Local Development Setup

For local development or if you want to run from source:

cd /home/enne2/Sviluppo/tetris-sdl/mcp-image-server
./run.sh

The script will automatically:

  • Create virtual environment if it doesn't exist
  • Install dependencies if needed
  • Activate the virtual environment
  • Start the server

Configuration

After installation, you need to add your OpenAI API key:

  1. Open Kilocode's MCP settings: ~/.config/VSCodium/User/globalStorage/kilocode.kilo-code/settings/mcp_settings.json

  2. Find the image-recognition server entry

  3. Replace "your-openai-api-key-here" with your actual OpenAI API key

  4. Restart Kilocode

Available Tools

1. describe_image

Analyzes an image from base64 encoded data using OpenAI's GPT-4 Vision.

Parameters:

  • image_data (string, required): Base64 encoded image data
  • mime_type (string, optional): MIME type of the image (default: 'image/jpeg')

Returns: Detailed AI-generated description of the image including objects, colors, composition, and visible text

Fallback: If OpenAI API is not configured, returns basic image metadata (size, mode, format)

2. describe_image_from_file

Analyzes an image from a file path using OpenAI's GPT-4 Vision.

Parameters:

  • file_path (string, required): Path to the image file

Returns: Detailed AI-generated description of the image

Supported formats: JPEG, PNG, GIF, WebP (automatically detected from file extension)

3. ask_image_question

Ask a specific question about an image using AI vision.

Parameters:

  • file_path (string, required): Path to the image file
  • prompt (string, required): The question or instruction about the image

Returns: AI response to the specific question about the image

Example usage: "What color is the car in this image?", "How many people are in this photo?", "What text is visible in this image?"

4. generate_image_dalle

Generate images using OpenAI's DALL-E API.

Parameters:

  • prompt (string, required): Description of the image to generate
  • size (string, optional): Image size - options: "1024x1024", "1792x1024", "1024x1792" (default: "1024x1024")
  • quality (string, optional): Image quality - options: "standard", "hd" (default: "standard")
  • style (string, optional): Image style - options: "vivid", "natural" (default: "vivid")
  • n (integer, optional): Number of images to generate (1-10, default: 1)

Returns: Generated image URLs and metadata

Example prompts: "A futuristic city skyline at sunset", "A cute robot playing with a cat", "Abstract art with blue and gold colors"

Example Usage

Once configured in Kilocode with a valid OpenAI API key:

Image Analysis:

Can you analyze the image at /path/to/image.jpg?

Ask Specific Questions:

What color is the car in /path/to/photo.jpg?
How many people are visible in /path/to/group_photo.png?
What text can you read in /path/to/document.jpg?

Generate Images:

Generate an image: "A peaceful mountain landscape at sunrise"
Create a high-quality image of "A futuristic robot in a cyberpunk city" in 1792x1024 size

The AI will use the appropriate tools (describe_image_from_file, ask_image_question, or generate_image_dalle) to provide detailed responses.

Installation Methods

pip install image-recognition-mcp

Automatically configures Kilocode!

Method 2: From Source

git clone https://github.com/yourusername/image-recognition-mcp.git
cd image-recognition-mcp
pip install -e .

Method 3: Using uvx (Portable)

uvx image-recognition-mcp

No installation needed! Works like npx for Python.

Kilocode Configuration

The server automatically adds this configuration:

{
  "mcpServers": {
    "image-recognition": {
      "command": "uvx",
      "args": ["image-recognition-mcp"],
      "env": {
        "OPENAI_API_KEY": "your-openai-api-key-here"
      }
    }
  }
}

Files Structure

mcp-image-server/
├── run.sh                          # Local startup script
├── requirements.txt                # Python dependencies
├── setup.py                        # Package setup (with auto-config)
├── pyproject.toml                  # Modern Python packaging
├── README.md                       # This file
├── PUBLISHING.md                   # Publishing guide
├── LICENSE                         # MIT License
├── MANIFEST.in                     # Package manifest
├── image_server.log               # Server logs
├── venv/                          # Virtual environment (auto-created)
└── image_recognition_server/
    ├── __init__.py
    ├── server.py                  # Main server implementation
    └── install.py                 # Auto-configuration script

Commands

After installation, these commands are available:

  • image-recognition-mcp - Start the MCP server
  • image-recognition-mcp-install - Configure Kilocode (runs automatically on install)

Dependencies

  • fastmcp: FastMCP framework for building MCP servers
  • pillow: Python Imaging Library for image processing
  • openai: OpenAI API client for Vision API

Logs

Server logs are written to: /home/enne2/Sviluppo/tetris-sdl/mcp-image-server/image_server.log (local)

Or when installed via pip: ~/.local/share/image-recognition-mcp/logs/ (system-wide)

How It Works

  1. With OpenAI API Key:

    • Images are encoded to base64
    • Sent to OpenAI's GPT-4o-mini Vision model
    • Returns detailed AI-generated descriptions
  2. Without OpenAI API Key:

    • Falls back to basic image metadata
    • Returns size, color mode, and format information
    • Includes a note about configuring the API key

Troubleshooting

Server won't start

  • Check that Python 3.8+ is installed: python3 --version
  • Verify installation: pip show image-recognition-mcp
  • Check logs for errors

Automatic configuration failed

  • Run manually: image-recognition-mcp-install
  • Or configure manually (see PUBLISHING.md)

No AI descriptions

  • Verify your OpenAI API key is correctly set in MCP settings
  • Check that the key is valid and has credits
  • Review logs for API errors
  • The server will show a warning on startup if no valid API key is detected

Image not found

  • Ensure the file path is absolute
  • Check file permissions
  • Verify the file exists: ls -la /path/to/image.jpg

Development

To modify the server:

  1. Clone the repository
  2. Install in development mode: pip install -e .
  3. Make changes to image_recognition_server/server.py
  4. Test locally: image-recognition-mcp

Publishing

See PUBLISHING.md for instructions on publishing to PyPI.

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Future Enhancements

  • Support for batch image processing
  • Image comparison tools
  • Custom vision models
  • Image generation capabilities
  • Support for more image formats
  • Caching for repeated image analyses
  • Web interface for testing