# MCP Image Recognition Server An MCP (Model Context Protocol) server that provides AI-powered image analysis tools for AI assistants. ## Features - **describe_image**: Analyze images from base64 encoded data using OpenAI's Vision API - **describe_image_from_file**: Analyze images from file paths using OpenAI's Vision API - Automatic fallback to basic metadata if OpenAI API is not configured - **Automatic Kilocode configuration** on installation - Portable and distributable via PyPI ## Quick Installation (Recommended) Install from PyPI (once published): ```bash pip install image-recognition-mcp ``` The server will **automatically configure itself** in Kilocode during installation! 🎉 If automatic configuration doesn't work, you can manually run: ```bash image-recognition-mcp-install ``` ## Local Development Setup For local development or if you want to run from source: ```bash cd /home/enne2/Sviluppo/tetris-sdl/mcp-image-server ./run.sh ``` The script will automatically: - ✅ Create virtual environment if it doesn't exist - ✅ Install dependencies if needed - ✅ Activate the virtual environment - ✅ Start the server ## Configuration After installation, you need to add your OpenAI API key: 1. Open Kilocode's MCP settings: `~/.config/VSCodium/User/globalStorage/kilocode.kilo-code/settings/mcp_settings.json` 2. Find the `image-recognition` server entry 3. Replace `"your-openai-api-key-here"` with your actual OpenAI API key 4. Restart Kilocode ## Available Tools ### 1. describe_image Analyzes an image from base64 encoded data using OpenAI's GPT-4 Vision. **Parameters:** - `image_data` (string, required): Base64 encoded image data - `mime_type` (string, optional): MIME type of the image (default: 'image/jpeg') **Returns:** Detailed AI-generated description of the image including objects, colors, composition, and visible text **Fallback:** If OpenAI API is not configured, returns basic image metadata (size, mode, format) ### 2. describe_image_from_file Analyzes an image from a file path using OpenAI's GPT-4 Vision. **Parameters:** - `file_path` (string, required): Path to the image file **Returns:** Detailed AI-generated description of the image **Supported formats:** JPEG, PNG, GIF, WebP (automatically detected from file extension) ### 3. ask_image_question Ask a specific question about an image using AI vision. **Parameters:** - `file_path` (string, required): Path to the image file - `prompt` (string, required): The question or instruction about the image **Returns:** AI response to the specific question about the image **Example usage:** "What color is the car in this image?", "How many people are in this photo?", "What text is visible in this image?" ### 4. generate_image_dalle Generate images using OpenAI's DALL-E 3 API and save them to a specified path. **Parameters:** - `prompt` (string, required): Description of the image to generate - `save_path` (string, required): Absolute path where to save the generated image(s) - `size` (string, optional): Image size - options: "1024x1024", "1792x1024", "1024x1792" (default: "1024x1024") - `quality` (string, optional): Image quality - options: "standard", "hd" (default: "standard") - `style` (string, optional): Image style - options: "vivid", "natural" (default: "vivid") - `n` (integer, optional): Number of images to generate (1-10, default: 1) **Returns:** Success message with saved file paths and image metadata **Example usage:** - Generate single image: `prompt="A peaceful mountain landscape", save_path="/home/user/images/mountain.png"` - Generate multiple images: `prompt="Abstract art", save_path="/home/user/art/abstract.png", n=3` (saves as abstract_1.png, abstract_2.png, abstract_3.png) - High quality image: `prompt="Professional logo", save_path="/home/user/logo.png", quality="hd", size="1792x1024"` **Features:** - Automatically creates directories if they don't exist - Downloads and saves images locally from DALL-E URLs - Handles multiple images with automatic filename indexing - Validates file paths and permissions - Reports file sizes and revised prompts **Note:** Requires OpenAI API key with DALL-E 3 access. Generated images are saved locally and URLs are temporary. ## Example Usage Once configured in Kilocode with a valid OpenAI API key: **Image Analysis:** ``` Can you analyze the image at /path/to/image.jpg? ``` **Ask Specific Questions:** ``` What color is the car in /path/to/photo.jpg? How many people are visible in /path/to/group_photo.png? What text can you read in /path/to/document.jpg? ``` **Generate Images:** ``` Generate an image: "A peaceful mountain landscape at sunrise" and save it to "/home/user/mountain.png" Create a high-quality image of "A futuristic robot in a cyberpunk city" in 1792x1024 size and save to "/home/user/robot.png" Generate 3 images of "Abstract geometric patterns" and save to "/home/user/patterns.png" ``` The AI will use the appropriate tools (`describe_image_from_file`, `ask_image_question`, or `generate_image_dalle`) to provide detailed responses. ## Installation Methods ### Method 1: PyPI (Recommended - Once Published) ```bash pip install image-recognition-mcp ``` Automatically configures Kilocode! ✨ ### Method 2: From Source ```bash git clone https://git.enne2.net/enne2/mcp-image-server.git cd image-recognition-mcp pip install -e . ``` ### Method 3: Using uvx (Portable) ```bash uvx image-recognition-mcp ``` No installation needed! Works like `npx` for Python. ## Kilocode Configuration The server automatically adds this configuration: ```json "image-recognition": { "command": "uvx", "args": [ "--from", "git+https://git.enne2.net/enne2/mcp-image-server.git", "image-recognition-server" ], "env": { "OPENAI_API_KEY": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" }, "disabled": false, "alwaysAllow": [] } ``` ## Files Structure ``` mcp-image-server/ ├── run.sh # Local startup script ├── requirements.txt # Python dependencies ├── setup.py # Package setup (with auto-config) ├── pyproject.toml # Modern Python packaging ├── README.md # This file ├── PUBLISHING.md # Publishing guide ├── LICENSE # MIT License ├── MANIFEST.in # Package manifest ├── image_server.log # Server logs ├── venv/ # Virtual environment (auto-created) └── image_recognition_server/ ├── __init__.py ├── server.py # Main server implementation └── install.py # Auto-configuration script ``` ## Commands After installation, these commands are available: - `image-recognition-mcp` - Start the MCP server - `image-recognition-mcp-install` - Configure Kilocode (runs automatically on install) ## Dependencies - **fastmcp**: FastMCP framework for building MCP servers - **pillow**: Python Imaging Library for image processing - **openai**: OpenAI API client for Vision API ## Logs Server logs are written to: `/home/enne2/Sviluppo/tetris-sdl/mcp-image-server/image_server.log` (local) Or when installed via pip: `~/.local/share/image-recognition-mcp/logs/` (system-wide) ## How It Works 1. **With OpenAI API Key:** - Images are encoded to base64 - Sent to OpenAI's GPT-4o-mini Vision model - Returns detailed AI-generated descriptions 2. **Without OpenAI API Key:** - Falls back to basic image metadata - Returns size, color mode, and format information - Includes a note about configuring the API key ## Troubleshooting ### Server won't start - Check that Python 3.8+ is installed: `python3 --version` - Verify installation: `pip show image-recognition-mcp` - Check logs for errors ### Automatic configuration failed - Run manually: `image-recognition-mcp-install` - Or configure manually (see PUBLISHING.md) ### No AI descriptions - Verify your OpenAI API key is correctly set in MCP settings - Check that the key is valid and has credits - Review logs for API errors - The server will show a warning on startup if no valid API key is detected ### Image not found - Ensure the file path is absolute - Check file permissions - Verify the file exists: `ls -la /path/to/image.jpg` ## Development To modify the server: 1. Clone the repository 2. Install in development mode: `pip install -e .` 3. Make changes to `image_recognition_server/server.py` 4. Test locally: `image-recognition-mcp` ## Publishing See [PUBLISHING.md](PUBLISHING.md) for instructions on publishing to PyPI. ## License MIT License - see [LICENSE](LICENSE) file for details. ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## Future Enhancements - Support for batch image processing - Image comparison tools - Custom vision models - Image generation capabilities - Support for more image formats - Caching for repeated image analyses - Web interface for testing