Matteo Benedetto f02899ee4a Implement code changes to enhance functionality and improve performance		4 months ago
.kilocode	Refactor logging configuration to use a dynamic log file path	4 months ago
assets/images	Implement code changes to enhance functionality and improve performance	4 months ago
image_recognition_server	Add save_path parameter to generate_image_dalle tool	4 months ago
.gitignore	first commit	4 months ago
LICENSE	first commit	4 months ago
MANIFEST.in	first commit	4 months ago
README.md	Implement code changes to enhance functionality and improve performance	4 months ago
pyproject.toml	Update repository links and remove obsolete files for improved clarity and accessibility	4 months ago
requirements.txt	Add save_path parameter to generate_image_dalle tool	4 months ago
run.sh	first commit	4 months ago
setup.py	Update repository links and remove obsolete files for improved clarity and accessibility	4 months ago

README.md

MCP Image Recognition Server

An MCP (Model Context Protocol) server that provides AI-powered image analysis tools for AI assistants.

Features

describe_image: Analyze images from base64 encoded data using OpenAI's Vision API
describe_image_from_file: Analyze images from file paths using OpenAI's Vision API
Automatic fallback to basic metadata if OpenAI API is not configured
Automatic Kilocode configuration on installation
Portable and distributable via PyPI

Quick Installation (Recommended)

Install from PyPI (once published):

pip install image-recognition-mcp

The server will automatically configure itself in Kilocode during installation! 🎉

If automatic configuration doesn't work, you can manually run:

image-recognition-mcp-install

Local Development Setup

For local development or if you want to run from source:

cd /home/enne2/Sviluppo/tetris-sdl/mcp-image-server
./run.sh

The script will automatically:

✅ Create virtual environment if it doesn't exist
✅ Install dependencies if needed
✅ Activate the virtual environment
✅ Start the server

Configuration

After installation, you need to add your OpenAI API key:

Open Kilocode's MCP settings: ~/.config/VSCodium/User/globalStorage/kilocode.kilo-code/settings/mcp_settings.json
Find the image-recognition server entry
Replace "your-openai-api-key-here" with your actual OpenAI API key
Restart Kilocode

Available Tools

1. describe_image

Analyzes an image from base64 encoded data using OpenAI's GPT-4 Vision.

Parameters:

image_data (string, required): Base64 encoded image data
mime_type (string, optional): MIME type of the image (default: 'image/jpeg')

Returns: Detailed AI-generated description of the image including objects, colors, composition, and visible text

Fallback: If OpenAI API is not configured, returns basic image metadata (size, mode, format)

2. describe_image_from_file

Analyzes an image from a file path using OpenAI's GPT-4 Vision.

Parameters:

file_path (string, required): Path to the image file

Returns: Detailed AI-generated description of the image

Supported formats: JPEG, PNG, GIF, WebP (automatically detected from file extension)

3. ask_image_question

Ask a specific question about an image using AI vision.

Parameters:

file_path (string, required): Path to the image file
prompt (string, required): The question or instruction about the image

Returns: AI response to the specific question about the image

Example usage: "What color is the car in this image?", "How many people are in this photo?", "What text is visible in this image?"

4. generate_image_dalle

Generate images using OpenAI's DALL-E 3 API and save them to a specified path.

Parameters:

prompt (string, required): Description of the image to generate
save_path (string, required): Absolute path where to save the generated image(s)
size (string, optional): Image size - options: "1024x1024", "1792x1024", "1024x1792" (default: "1024x1024")
quality (string, optional): Image quality - options: "standard", "hd" (default: "standard")
style (string, optional): Image style - options: "vivid", "natural" (default: "vivid")
n (integer, optional): Number of images to generate (1-10, default: 1)

Returns: Success message with saved file paths and image metadata

Example usage:

Generate single image: prompt="A peaceful mountain landscape", save_path="/home/user/images/mountain.png"
Generate multiple images: prompt="Abstract art", save_path="/home/user/art/abstract.png", n=3 (saves as abstract_1.png, abstract_2.png, abstract_3.png)
High quality image: prompt="Professional logo", save_path="/home/user/logo.png", quality="hd", size="1792x1024"

Features:

Automatically creates directories if they don't exist
Downloads and saves images locally from DALL-E URLs
Handles multiple images with automatic filename indexing
Validates file paths and permissions
Reports file sizes and revised prompts

Note: Requires OpenAI API key with DALL-E 3 access. Generated images are saved locally and URLs are temporary.

Example Usage

Once configured in Kilocode with a valid OpenAI API key:

Image Analysis:

Can you analyze the image at /path/to/image.jpg?

Ask Specific Questions:

What color is the car in /path/to/photo.jpg?
How many people are visible in /path/to/group_photo.png?
What text can you read in /path/to/document.jpg?

Generate Images:

Generate an image: "A peaceful mountain landscape at sunrise" and save it to "/home/user/mountain.png"
Create a high-quality image of "A futuristic robot in a cyberpunk city" in 1792x1024 size and save to "/home/user/robot.png"
Generate 3 images of "Abstract geometric patterns" and save to "/home/user/patterns.png"

The AI will use the appropriate tools (describe_image_from_file, ask_image_question, or generate_image_dalle) to provide detailed responses.

Installation Methods

Method 1: PyPI (Recommended - Once Published)

pip install image-recognition-mcp

Automatically configures Kilocode! ✨

Method 2: From Source

git clone https://git.enne2.net/enne2/mcp-image-server.git
cd image-recognition-mcp
pip install -e .

Method 3: Using uvx (Portable)

uvx image-recognition-mcp

No installation needed! Works like npx for Python.

Kilocode Configuration

The server automatically adds this configuration:

    "image-recognition": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://git.enne2.net/enne2/mcp-image-server.git",
        "image-recognition-server"
      ],
      "env": {
        "OPENAI_API_KEY": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
      },
      "disabled": false,
      "alwaysAllow": []
    }

Files Structure

mcp-image-server/
├── run.sh                          # Local startup script
├── requirements.txt                # Python dependencies
├── setup.py                        # Package setup (with auto-config)
├── pyproject.toml                  # Modern Python packaging
├── README.md                       # This file
├── PUBLISHING.md                   # Publishing guide
├── LICENSE                         # MIT License
├── MANIFEST.in                     # Package manifest
├── image_server.log               # Server logs
├── venv/                          # Virtual environment (auto-created)
├── assets/                         # Project assets
│   └── images/
│       └── logo.png               # Project logo
└── image_recognition_server/
    ├── __init__.py
    ├── server.py                  # Main server implementation
    └── install.py                 # Auto-configuration script

Commands

After installation, these commands are available:

image-recognition-mcp - Start the MCP server
image-recognition-mcp-install - Configure Kilocode (runs automatically on install)

Dependencies

fastmcp: FastMCP framework for building MCP servers
pillow: Python Imaging Library for image processing
openai: OpenAI API client for Vision API

Logs

Server logs are written to: /home/enne2/Sviluppo/tetris-sdl/mcp-image-server/image_server.log (local)

Or when installed via pip: ~/.local/share/image-recognition-mcp/logs/ (system-wide)

How It Works

With OpenAI API Key:
- Images are encoded to base64
- Sent to OpenAI's GPT-4o-mini Vision model
- Returns detailed AI-generated descriptions
Without OpenAI API Key:
- Falls back to basic image metadata
- Returns size, color mode, and format information
- Includes a note about configuring the API key

Troubleshooting

Server won't start

Check that Python 3.8+ is installed: python3 --version
Verify installation: pip show image-recognition-mcp
Check logs for errors

Automatic configuration failed

Run manually: image-recognition-mcp-install
Or configure manually (see PUBLISHING.md)

No AI descriptions

Verify your OpenAI API key is correctly set in MCP settings
Check that the key is valid and has credits
Review logs for API errors
The server will show a warning on startup if no valid API key is detected

Image not found

Ensure the file path is absolute
Check file permissions
Verify the file exists: ls -la /path/to/image.jpg

Development

To modify the server:

Clone the repository
Install in development mode: pip install -e .
Make changes to image_recognition_server/server.py
Test locally: image-recognition-mcp

Publishing

See PUBLISHING.md for instructions on publishing to PyPI.

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Future Enhancements

Support for batch image processing
Image comparison tools
Custom vision models
Image generation capabilities
Support for more image formats
Caching for repeated image analyses
Web interface for testing