# MCP Image Recognition Server

An MCP (Model Context Protocol) server that provides AI-powered image analysis tools for AI assistants.

## Features

- **describe_image**: Analyze images from base64 encoded data using OpenAI's Vision API
- **describe_image_from_file**: Analyze images from file paths using OpenAI's Vision API
- Automatic fallback to basic metadata if OpenAI API is not configured
- **Automatic Kilocode configuration** on installation
- Portable and distributable via PyPI

## Quick Installation (Recommended)

Install from PyPI (once published):

```bash
pip install image-recognition-mcp
```

The server will **automatically configure itself** in Kilocode during installation! 🎉

If automatic configuration doesn't work, you can manually run:

```bash
image-recognition-mcp-install
```

## Local Development Setup

For local development or if you want to run from source:

```bash
cd /home/enne2/Sviluppo/tetris-sdl/mcp-image-server
./run.sh
```

The script will automatically:
- ✅ Create virtual environment if it doesn't exist
- ✅ Install dependencies if needed
- ✅ Activate the virtual environment
- ✅ Start the server

## Configuration

After installation, you need to add your OpenAI API key:

1. Open Kilocode's MCP settings:
   `~/.config/VSCodium/User/globalStorage/kilocode.kilo-code/settings/mcp_settings.json`

2. Find the `image-recognition` server entry

3. Replace `"your-openai-api-key-here"` with your actual OpenAI API key

4. Restart Kilocode

## Available Tools

### 1. describe_image
Analyzes an image from base64 encoded data using OpenAI's GPT-4 Vision.

**Parameters:**
- `image_data` (string, required): Base64 encoded image data
- `mime_type` (string, optional): MIME type of the image (default: 'image/jpeg')

**Returns:** Detailed AI-generated description of the image including objects, colors, composition, and visible text

**Fallback:** If OpenAI API is not configured, returns basic image metadata (size, mode, format)

### 2. describe_image_from_file
Analyzes an image from a file path using OpenAI's GPT-4 Vision.

**Parameters:**
- `file_path` (string, required): Path to the image file

**Returns:** Detailed AI-generated description of the image

**Supported formats:** JPEG, PNG, GIF, WebP (automatically detected from file extension)

### 3. ask_image_question
Ask a specific question about an image using AI vision.

**Parameters:**
- `file_path` (string, required): Path to the image file
- `prompt` (string, required): The question or instruction about the image

**Returns:** AI response to the specific question about the image

**Example usage:** "What color is the car in this image?", "How many people are in this photo?", "What text is visible in this image?"

### 4. generate_image_dalle
Generate images using OpenAI's DALL-E 3 API and save them to a specified path.

**Parameters:**
- `prompt` (string, required): Description of the image to generate
- `save_path` (string, required): Absolute path where to save the generated image(s)
- `size` (string, optional): Image size - options: "1024x1024", "1792x1024", "1024x1792" (default: "1024x1024")
- `quality` (string, optional): Image quality - options: "standard", "hd" (default: "standard")
- `style` (string, optional): Image style - options: "vivid", "natural" (default: "vivid")
- `n` (integer, optional): Number of images to generate (1-10, default: 1)

**Returns:** Success message with saved file paths and image metadata

**Example usage:** 
- Generate single image: `prompt="A peaceful mountain landscape", save_path="/home/user/images/mountain.png"`
- Generate multiple images: `prompt="Abstract art", save_path="/home/user/art/abstract.png", n=3` (saves as abstract_1.png, abstract_2.png, abstract_3.png)
- High quality image: `prompt="Professional logo", save_path="/home/user/logo.png", quality="hd", size="1792x1024"`

**Features:**
- Automatically creates directories if they don't exist
- Downloads and saves images locally from DALL-E URLs
- Handles multiple images with automatic filename indexing
- Validates file paths and permissions
- Reports file sizes and revised prompts

**Note:** Requires OpenAI API key with DALL-E 3 access. Generated images are saved locally and URLs are temporary.

## Example Usage

Once configured in Kilocode with a valid OpenAI API key:

**Image Analysis:**
```
Can you analyze the image at /path/to/image.jpg?
```

**Ask Specific Questions:**
```
What color is the car in /path/to/photo.jpg?
How many people are visible in /path/to/group_photo.png?
What text can you read in /path/to/document.jpg?
```

**Generate Images:**
```
Generate an image: "A peaceful mountain landscape at sunrise" and save it to "/home/user/mountain.png"
Create a high-quality image of "A futuristic robot in a cyberpunk city" in 1792x1024 size and save to "/home/user/robot.png"
Generate 3 images of "Abstract geometric patterns" and save to "/home/user/patterns.png"
```

The AI will use the appropriate tools (`describe_image_from_file`, `ask_image_question`, or `generate_image_dalle`) to provide detailed responses.

## Installation Methods

### Method 1: PyPI (Recommended - Once Published)

```bash
pip install image-recognition-mcp
```

Automatically configures Kilocode! ✨

### Method 2: From Source

```bash
git clone https://git.enne2.net/enne2/mcp-image-server.git
cd image-recognition-mcp
pip install -e .
```

### Method 3: Using uvx (Portable)

```bash
uvx image-recognition-mcp
```

No installation needed! Works like `npx` for Python.

## Kilocode Configuration

The server automatically adds this configuration:

```json
    "image-recognition": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://git.enne2.net/enne2/mcp-image-server.git",
        "image-recognition-server"
      ],
      "env": {
        "OPENAI_API_KEY": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
      },
      "disabled": false,
      "alwaysAllow": []
    }
```

## Files Structure

```
mcp-image-server/
├── run.sh                          # Local startup script
├── requirements.txt                # Python dependencies
├── setup.py                        # Package setup (with auto-config)
├── pyproject.toml                  # Modern Python packaging
├── README.md                       # This file
├── PUBLISHING.md                   # Publishing guide
├── LICENSE                         # MIT License
├── MANIFEST.in                     # Package manifest
├── image_server.log               # Server logs
├── venv/                          # Virtual environment (auto-created)
└── image_recognition_server/
    ├── __init__.py
    ├── server.py                  # Main server implementation
    └── install.py                 # Auto-configuration script
```

## Commands

After installation, these commands are available:

- `image-recognition-mcp` - Start the MCP server
- `image-recognition-mcp-install` - Configure Kilocode (runs automatically on install)

## Dependencies

- **fastmcp**: FastMCP framework for building MCP servers
- **pillow**: Python Imaging Library for image processing
- **openai**: OpenAI API client for Vision API

## Logs

Server logs are written to:
`/home/enne2/Sviluppo/tetris-sdl/mcp-image-server/image_server.log` (local)

Or when installed via pip:
`~/.local/share/image-recognition-mcp/logs/` (system-wide)

## How It Works

1. **With OpenAI API Key:**
   - Images are encoded to base64
   - Sent to OpenAI's GPT-4o-mini Vision model
   - Returns detailed AI-generated descriptions

2. **Without OpenAI API Key:**
   - Falls back to basic image metadata
   - Returns size, color mode, and format information
   - Includes a note about configuring the API key

## Troubleshooting

### Server won't start
- Check that Python 3.8+ is installed: `python3 --version`
- Verify installation: `pip show image-recognition-mcp`
- Check logs for errors

### Automatic configuration failed
- Run manually: `image-recognition-mcp-install`
- Or configure manually (see PUBLISHING.md)

### No AI descriptions
- Verify your OpenAI API key is correctly set in MCP settings
- Check that the key is valid and has credits
- Review logs for API errors
- The server will show a warning on startup if no valid API key is detected

### Image not found
- Ensure the file path is absolute
- Check file permissions
- Verify the file exists: `ls -la /path/to/image.jpg`

## Development

To modify the server:

1. Clone the repository
2. Install in development mode: `pip install -e .`
3. Make changes to `image_recognition_server/server.py`
4. Test locally: `image-recognition-mcp`

## Publishing

See [PUBLISHING.md](PUBLISHING.md) for instructions on publishing to PyPI.

## License

MIT License - see [LICENSE](LICENSE) file for details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Future Enhancements

- Support for batch image processing
- Image comparison tools
- Custom vision models
- Image generation capabilities
- Support for more image formats
- Caching for repeated image analyses
- Web interface for testing