You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Matteo Benedetto f02899ee4a Implement code changes to enhance functionality and improve performance 3 months ago
.kilocode Refactor logging configuration to use a dynamic log file path 3 months ago
assets/images Implement code changes to enhance functionality and improve performance 3 months ago
image_recognition_server Add save_path parameter to generate_image_dalle tool 3 months ago
.gitignore first commit 3 months ago
LICENSE first commit 3 months ago
MANIFEST.in first commit 3 months ago
README.md Implement code changes to enhance functionality and improve performance 3 months ago
pyproject.toml Update repository links and remove obsolete files for improved clarity and accessibility 3 months ago
requirements.txt Add save_path parameter to generate_image_dalle tool 3 months ago
run.sh first commit 3 months ago
setup.py Update repository links and remove obsolete files for improved clarity and accessibility 3 months ago

README.md

MCP Image Recognition Server

Logo

An MCP (Model Context Protocol) server that provides AI-powered image analysis tools for AI assistants.

Features

  • describe_image: Analyze images from base64 encoded data using OpenAI's Vision API
  • describe_image_from_file: Analyze images from file paths using OpenAI's Vision API
  • Automatic fallback to basic metadata if OpenAI API is not configured
  • Automatic Kilocode configuration on installation
  • Portable and distributable via PyPI

Install from PyPI (once published):

pip install image-recognition-mcp

The server will automatically configure itself in Kilocode during installation! 🎉

If automatic configuration doesn't work, you can manually run:

image-recognition-mcp-install

Local Development Setup

For local development or if you want to run from source:

cd /home/enne2/Sviluppo/tetris-sdl/mcp-image-server
./run.sh

The script will automatically:

  • Create virtual environment if it doesn't exist
  • Install dependencies if needed
  • Activate the virtual environment
  • Start the server

Configuration

After installation, you need to add your OpenAI API key:

  1. Open Kilocode's MCP settings: ~/.config/VSCodium/User/globalStorage/kilocode.kilo-code/settings/mcp_settings.json

  2. Find the image-recognition server entry

  3. Replace "your-openai-api-key-here" with your actual OpenAI API key

  4. Restart Kilocode

Available Tools

1. describe_image

Analyzes an image from base64 encoded data using OpenAI's GPT-4 Vision.

Parameters:

  • image_data (string, required): Base64 encoded image data
  • mime_type (string, optional): MIME type of the image (default: 'image/jpeg')

Returns: Detailed AI-generated description of the image including objects, colors, composition, and visible text

Fallback: If OpenAI API is not configured, returns basic image metadata (size, mode, format)

2. describe_image_from_file

Analyzes an image from a file path using OpenAI's GPT-4 Vision.

Parameters:

  • file_path (string, required): Path to the image file

Returns: Detailed AI-generated description of the image

Supported formats: JPEG, PNG, GIF, WebP (automatically detected from file extension)

3. ask_image_question

Ask a specific question about an image using AI vision.

Parameters:

  • file_path (string, required): Path to the image file
  • prompt (string, required): The question or instruction about the image

Returns: AI response to the specific question about the image

Example usage: "What color is the car in this image?", "How many people are in this photo?", "What text is visible in this image?"

4. generate_image_dalle

Generate images using OpenAI's DALL-E 3 API and save them to a specified path.

Parameters:

  • prompt (string, required): Description of the image to generate
  • save_path (string, required): Absolute path where to save the generated image(s)
  • size (string, optional): Image size - options: "1024x1024", "1792x1024", "1024x1792" (default: "1024x1024")
  • quality (string, optional): Image quality - options: "standard", "hd" (default: "standard")
  • style (string, optional): Image style - options: "vivid", "natural" (default: "vivid")
  • n (integer, optional): Number of images to generate (1-10, default: 1)

Returns: Success message with saved file paths and image metadata

Example usage:

  • Generate single image: prompt="A peaceful mountain landscape", save_path="/home/user/images/mountain.png"
  • Generate multiple images: prompt="Abstract art", save_path="/home/user/art/abstract.png", n=3 (saves as abstract_1.png, abstract_2.png, abstract_3.png)
  • High quality image: prompt="Professional logo", save_path="/home/user/logo.png", quality="hd", size="1792x1024"

Features:

  • Automatically creates directories if they don't exist
  • Downloads and saves images locally from DALL-E URLs
  • Handles multiple images with automatic filename indexing
  • Validates file paths and permissions
  • Reports file sizes and revised prompts

Note: Requires OpenAI API key with DALL-E 3 access. Generated images are saved locally and URLs are temporary.

Example Usage

Once configured in Kilocode with a valid OpenAI API key:

Image Analysis:

Can you analyze the image at /path/to/image.jpg?

Ask Specific Questions:

What color is the car in /path/to/photo.jpg?
How many people are visible in /path/to/group_photo.png?
What text can you read in /path/to/document.jpg?

Generate Images:

Generate an image: "A peaceful mountain landscape at sunrise" and save it to "/home/user/mountain.png"
Create a high-quality image of "A futuristic robot in a cyberpunk city" in 1792x1024 size and save to "/home/user/robot.png"
Generate 3 images of "Abstract geometric patterns" and save to "/home/user/patterns.png"

The AI will use the appropriate tools (describe_image_from_file, ask_image_question, or generate_image_dalle) to provide detailed responses.

Installation Methods

pip install image-recognition-mcp

Automatically configures Kilocode!

Method 2: From Source

git clone https://git.enne2.net/enne2/mcp-image-server.git
cd image-recognition-mcp
pip install -e .

Method 3: Using uvx (Portable)

uvx image-recognition-mcp

No installation needed! Works like npx for Python.

Kilocode Configuration

The server automatically adds this configuration:

    "image-recognition": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://git.enne2.net/enne2/mcp-image-server.git",
        "image-recognition-server"
      ],
      "env": {
        "OPENAI_API_KEY": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
      },
      "disabled": false,
      "alwaysAllow": []
    }

Files Structure

mcp-image-server/
├── run.sh                          # Local startup script
├── requirements.txt                # Python dependencies
├── setup.py                        # Package setup (with auto-config)
├── pyproject.toml                  # Modern Python packaging
├── README.md                       # This file
├── PUBLISHING.md                   # Publishing guide
├── LICENSE                         # MIT License
├── MANIFEST.in                     # Package manifest
├── image_server.log               # Server logs
├── venv/                          # Virtual environment (auto-created)
├── assets/                         # Project assets
│   └── images/
│       └── logo.png               # Project logo
└── image_recognition_server/
    ├── __init__.py
    ├── server.py                  # Main server implementation
    └── install.py                 # Auto-configuration script

Commands

After installation, these commands are available:

  • image-recognition-mcp - Start the MCP server
  • image-recognition-mcp-install - Configure Kilocode (runs automatically on install)

Dependencies

  • fastmcp: FastMCP framework for building MCP servers
  • pillow: Python Imaging Library for image processing
  • openai: OpenAI API client for Vision API

Logs

Server logs are written to: /home/enne2/Sviluppo/tetris-sdl/mcp-image-server/image_server.log (local)

Or when installed via pip: ~/.local/share/image-recognition-mcp/logs/ (system-wide)

How It Works

  1. With OpenAI API Key:

    • Images are encoded to base64
    • Sent to OpenAI's GPT-4o-mini Vision model
    • Returns detailed AI-generated descriptions
  2. Without OpenAI API Key:

    • Falls back to basic image metadata
    • Returns size, color mode, and format information
    • Includes a note about configuring the API key

Troubleshooting

Server won't start

  • Check that Python 3.8+ is installed: python3 --version
  • Verify installation: pip show image-recognition-mcp
  • Check logs for errors

Automatic configuration failed

  • Run manually: image-recognition-mcp-install
  • Or configure manually (see PUBLISHING.md)

No AI descriptions

  • Verify your OpenAI API key is correctly set in MCP settings
  • Check that the key is valid and has credits
  • Review logs for API errors
  • The server will show a warning on startup if no valid API key is detected

Image not found

  • Ensure the file path is absolute
  • Check file permissions
  • Verify the file exists: ls -la /path/to/image.jpg

Development

To modify the server:

  1. Clone the repository
  2. Install in development mode: pip install -e .
  3. Make changes to image_recognition_server/server.py
  4. Test locally: image-recognition-mcp

Publishing

See PUBLISHING.md for instructions on publishing to PyPI.

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Future Enhancements

  • Support for batch image processing
  • Image comparison tools
  • Custom vision models
  • Image generation capabilities
  • Support for more image formats
  • Caching for repeated image analyses
  • Web interface for testing