|
|
3 months ago | |
|---|---|---|
| .kilocode | 3 months ago | |
| assets/images | 3 months ago | |
| image_recognition_server | 3 months ago | |
| .gitignore | 3 months ago | |
| LICENSE | 3 months ago | |
| MANIFEST.in | 3 months ago | |
| README.md | 3 months ago | |
| pyproject.toml | 3 months ago | |
| requirements.txt | 3 months ago | |
| run.sh | 3 months ago | |
| setup.py | 3 months ago | |
README.md
MCP Image Recognition Server
An MCP (Model Context Protocol) server that provides AI-powered image analysis tools for AI assistants.
Features
- describe_image: Analyze images from base64 encoded data using OpenAI's Vision API
- describe_image_from_file: Analyze images from file paths using OpenAI's Vision API
- Automatic fallback to basic metadata if OpenAI API is not configured
- Automatic Kilocode configuration on installation
- Portable and distributable via PyPI
Quick Installation (Recommended)
Install from PyPI (once published):
pip install image-recognition-mcp
The server will automatically configure itself in Kilocode during installation! 🎉
If automatic configuration doesn't work, you can manually run:
image-recognition-mcp-install
Local Development Setup
For local development or if you want to run from source:
cd /home/enne2/Sviluppo/tetris-sdl/mcp-image-server
./run.sh
The script will automatically:
- ✅ Create virtual environment if it doesn't exist
- ✅ Install dependencies if needed
- ✅ Activate the virtual environment
- ✅ Start the server
Configuration
After installation, you need to add your OpenAI API key:
-
Open Kilocode's MCP settings:
~/.config/VSCodium/User/globalStorage/kilocode.kilo-code/settings/mcp_settings.json -
Find the
image-recognitionserver entry -
Replace
"your-openai-api-key-here"with your actual OpenAI API key -
Restart Kilocode
Available Tools
1. describe_image
Analyzes an image from base64 encoded data using OpenAI's GPT-4 Vision.
Parameters:
image_data(string, required): Base64 encoded image datamime_type(string, optional): MIME type of the image (default: 'image/jpeg')
Returns: Detailed AI-generated description of the image including objects, colors, composition, and visible text
Fallback: If OpenAI API is not configured, returns basic image metadata (size, mode, format)
2. describe_image_from_file
Analyzes an image from a file path using OpenAI's GPT-4 Vision.
Parameters:
file_path(string, required): Path to the image file
Returns: Detailed AI-generated description of the image
Supported formats: JPEG, PNG, GIF, WebP (automatically detected from file extension)
3. ask_image_question
Ask a specific question about an image using AI vision.
Parameters:
file_path(string, required): Path to the image fileprompt(string, required): The question or instruction about the image
Returns: AI response to the specific question about the image
Example usage: "What color is the car in this image?", "How many people are in this photo?", "What text is visible in this image?"
4. generate_image_dalle
Generate images using OpenAI's DALL-E 3 API and save them to a specified path.
Parameters:
prompt(string, required): Description of the image to generatesave_path(string, required): Absolute path where to save the generated image(s)size(string, optional): Image size - options: "1024x1024", "1792x1024", "1024x1792" (default: "1024x1024")quality(string, optional): Image quality - options: "standard", "hd" (default: "standard")style(string, optional): Image style - options: "vivid", "natural" (default: "vivid")n(integer, optional): Number of images to generate (1-10, default: 1)
Returns: Success message with saved file paths and image metadata
Example usage:
- Generate single image:
prompt="A peaceful mountain landscape", save_path="/home/user/images/mountain.png" - Generate multiple images:
prompt="Abstract art", save_path="/home/user/art/abstract.png", n=3(saves as abstract_1.png, abstract_2.png, abstract_3.png) - High quality image:
prompt="Professional logo", save_path="/home/user/logo.png", quality="hd", size="1792x1024"
Features:
- Automatically creates directories if they don't exist
- Downloads and saves images locally from DALL-E URLs
- Handles multiple images with automatic filename indexing
- Validates file paths and permissions
- Reports file sizes and revised prompts
Note: Requires OpenAI API key with DALL-E 3 access. Generated images are saved locally and URLs are temporary.
Example Usage
Once configured in Kilocode with a valid OpenAI API key:
Image Analysis:
Can you analyze the image at /path/to/image.jpg?
Ask Specific Questions:
What color is the car in /path/to/photo.jpg?
How many people are visible in /path/to/group_photo.png?
What text can you read in /path/to/document.jpg?
Generate Images:
Generate an image: "A peaceful mountain landscape at sunrise" and save it to "/home/user/mountain.png"
Create a high-quality image of "A futuristic robot in a cyberpunk city" in 1792x1024 size and save to "/home/user/robot.png"
Generate 3 images of "Abstract geometric patterns" and save to "/home/user/patterns.png"
The AI will use the appropriate tools (describe_image_from_file, ask_image_question, or generate_image_dalle) to provide detailed responses.
Installation Methods
Method 1: PyPI (Recommended - Once Published)
pip install image-recognition-mcp
Automatically configures Kilocode! ✨
Method 2: From Source
git clone https://git.enne2.net/enne2/mcp-image-server.git
cd image-recognition-mcp
pip install -e .
Method 3: Using uvx (Portable)
uvx image-recognition-mcp
No installation needed! Works like npx for Python.
Kilocode Configuration
The server automatically adds this configuration:
"image-recognition": {
"command": "uvx",
"args": [
"--from",
"git+https://git.enne2.net/enne2/mcp-image-server.git",
"image-recognition-server"
],
"env": {
"OPENAI_API_KEY": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
"disabled": false,
"alwaysAllow": []
}
Files Structure
mcp-image-server/
├── run.sh # Local startup script
├── requirements.txt # Python dependencies
├── setup.py # Package setup (with auto-config)
├── pyproject.toml # Modern Python packaging
├── README.md # This file
├── PUBLISHING.md # Publishing guide
├── LICENSE # MIT License
├── MANIFEST.in # Package manifest
├── image_server.log # Server logs
├── venv/ # Virtual environment (auto-created)
├── assets/ # Project assets
│ └── images/
│ └── logo.png # Project logo
└── image_recognition_server/
├── __init__.py
├── server.py # Main server implementation
└── install.py # Auto-configuration script
Commands
After installation, these commands are available:
image-recognition-mcp- Start the MCP serverimage-recognition-mcp-install- Configure Kilocode (runs automatically on install)
Dependencies
- fastmcp: FastMCP framework for building MCP servers
- pillow: Python Imaging Library for image processing
- openai: OpenAI API client for Vision API
Logs
Server logs are written to:
/home/enne2/Sviluppo/tetris-sdl/mcp-image-server/image_server.log (local)
Or when installed via pip:
~/.local/share/image-recognition-mcp/logs/ (system-wide)
How It Works
-
With OpenAI API Key:
- Images are encoded to base64
- Sent to OpenAI's GPT-4o-mini Vision model
- Returns detailed AI-generated descriptions
-
Without OpenAI API Key:
- Falls back to basic image metadata
- Returns size, color mode, and format information
- Includes a note about configuring the API key
Troubleshooting
Server won't start
- Check that Python 3.8+ is installed:
python3 --version - Verify installation:
pip show image-recognition-mcp - Check logs for errors
Automatic configuration failed
- Run manually:
image-recognition-mcp-install - Or configure manually (see PUBLISHING.md)
No AI descriptions
- Verify your OpenAI API key is correctly set in MCP settings
- Check that the key is valid and has credits
- Review logs for API errors
- The server will show a warning on startup if no valid API key is detected
Image not found
- Ensure the file path is absolute
- Check file permissions
- Verify the file exists:
ls -la /path/to/image.jpg
Development
To modify the server:
- Clone the repository
- Install in development mode:
pip install -e . - Make changes to
image_recognition_server/server.py - Test locally:
image-recognition-mcp
Publishing
See PUBLISHING.md for instructions on publishing to PyPI.
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Future Enhancements
- Support for batch image processing
- Image comparison tools
- Custom vision models
- Image generation capabilities
- Support for more image formats
- Caching for repeated image analyses
- Web interface for testing