Browse Source

feat: Enhance server functionality with detailed command execution instructions and improve test report documentation

main
Matteo Benedetto 3 months ago
parent
commit
37e7517b62
  1. 383
      TEST_REPORT.md
  2. 30
      src/gnome_vte_mcp/gui.py
  3. 281
      src/gnome_vte_mcp/server.py

383
TEST_REPORT.md

@ -0,0 +1,383 @@
# mcp-hal9002 Test Report
Data: 15 marzo 2026 — aggiornato al 16 marzo 2026
## Obiettivo
Tracciare gli scenari di test eseguiti sui tool MCP di `mcp-hal9002`, con esito, note operative e anomalie osservate. Questo documento viene aggiornato a ogni campagna di test significativa.
## Ambito Coperto
Tool coperti durante le sessioni:
- `gui_status()`
- `open_gui()` implicito tramite `open_tab(...)`
- `close_gui()`
- `open_tab(...)`
- `list_tabs()`
- `focus_tab(...)`
- `exec_command(...)`
- `read_tab(...)`
- `read_last_command_result(...)`
- `wait_for_command(...)`
- `wait_for_running_command(...)`
- `wait_for_command_result(...)`
- `wait_for_prompt(...)`
- `capture_screenshot(...)`
- `close_tab(...)`
## Matrice Scenari
### 1. Lifecycle GUI (T1)
Stato iniziale verificato con GUI già in esecuzione e tab residue da test precedenti.
Scenari eseguiti:
- chiusura completa della GUI condivisa con `close_gui()`
- verifica di `running=false` dopo `close_gui()`
- riavvio implicito della GUI tramite `open_tab(...)`
- verifica del riuso dell'istanza condivisa via socket locale
- `gui_status()` con GUI spenta → `running=false`
- `gui_status()` con GUI attiva → campi geometria e `tab_count` presenti
Esito: `PASS`
Note:
- la GUI si spegne correttamente e il socket viene rimosso
- una nuova `open_tab(...)` rilancia la GUI come previsto
- `gui_status()` non lancia la GUI — comportamento corretto
### 2. Lifecycle Tab (T2)
Scenari eseguiti:
- apertura di più tab con `title` differenti
- apertura di tab con `cwd` esplicito
- verifica elenco tab con `list_tabs()`
- cambio tab attiva con `focus_tab(...)`
- chiusura di una tab intermedia con `close_tab(...)`
- verifica consistenza degli ID residui dopo la chiusura
- `focus_tab(...)` su `tab_id` inesistente → RuntimeError atteso
Esito: `PASS`
Note:
- gli ID tab restano coerenti dopo chiusure intermedie
- lo stato `active` segue correttamente la tab focalizzata
- gli errori su `tab_id` inesistente sono uniformi tra tutti i tool tab-scoped
### 3. Esecuzione Comandi con Auto Submit (T3)
Scenari eseguiti:
- `exec_command(tab_id, "pwd", auto_submit=True)` su tab in `cwd=/home/enne2`
- `exec_command(tab_id, "pwd", auto_submit=True)` su tab in `cwd=/home/enne2/dev/mcp-hal9002`
- `exec_command(tab_id, "whoami", auto_submit=True)`
- `exec_command(tab_id, "ls /definitely-missing-path", auto_submit=True)`
- `exec_command(tab_id, "uname -r", auto_submit=True)`
Esito: `PASS`
Note:
- `auto_submit=True` invia il comando e attende solo la sua conclusione
- il `cwd` osservato nei risultati coincide con la directory della tab
- gli errori runtime del comando (`exit_code=1`) sono tracciati correttamente
- `after_sequence` restituito è pronto per passare a `wait_for_command_result`
### 4. Pattern after_sequence (T4)
Scenari eseguiti:
- submit di `ls /tmp` con `auto_submit=True`, salvataggio di `after_sequence`
- chiamata a `wait_for_command_result(tab_id, after_sequence=...)` con il valore ottenuto
- submit di un secondo comando e verifica che `after_sequence` diverso isoli risultati separati
Esito: `PASS`
Note:
- `after_sequence` permette di isolare esattamente il comando appena emesso
- i risultati includono `sequence`, `command`, `cwd`, `cwd_after`, `exit_code`, `duration_seconds`, `text`
- il campo `text` contiene l'output catturato tra i marker shell, pulito da prompt e echo
### 5. Lettura Output e Risultati (T5)
Scenari eseguiti:
- lettura scrollback grezzo con `read_tab(...)` dopo un `exec_command`
- lettura ultimo risultato tracciato con `read_last_command_result(...)`
- attesa del risultato con `wait_for_command_result(...)`
- confronto metadati tra `read_last_command_result(...)` e `wait_for_command_result(...)`
- distinzione uso: `read_tab` per output grezzo, `read_last_command_result` per metadati strutturati
Esito: `PASS`
Note:
- `read_tab` restituisce testo grezzo con prompt e echo: utile per debug e sessioni delegate
- `read_last_command_result` fornisce output pulito tra i marker hook: utile per parsing strutturato
- i due metodi convergono sugli stessi metadati per lo stesso comando completato
- `read_tab` è l'unica opzione pratica nelle sessioni delegate dove i command events non vengono emessi
### 6. Attese e Timeout (T6)
Scenari eseguiti:
- pausa esplicita con `wait_for_command(delay_seconds=2.0)`
- timeout breve su comando lungo con `wait_for_running_command(tab_id, timeout=0.5)` → RuntimeError atteso
- `wait_for_running_command(tab_id)` su tab idle → RuntimeError "no tracked command is currently running"
- attesa lunga di completamento sullo stesso comando in esecuzione
- attesa di risultato con `after_sequence` corretto su `du -ah /usr`
Esito: `PASS`
Note:
- `wait_for_running_command` su tab idle fallisce immediatamente come previsto
- il timeout breve fallisce correttamente mentre il comando è ancora in corso
- l'attesa lunga porta al completamento atteso
- `wait_for_running_command(...)` e `wait_for_command_result(...)` restituiscono metadati coerenti sullo stesso evento finale
### 7. Screenshot — Target e Naming (T7 / T8)
Scenari eseguiti:
- screenshot `target="window"` con overlay diagnostico
- screenshot `target="tab"` con overlay diagnostico
- screenshot `target="tab-container"` con overlay diagnostico
- entrambi `target="tab"` e `target="tab-container"` sullo stesso `tab_id`, stesso secondo → collisione naming confermata
- stesso scenario con `path` esplicito per ciascuna chiamata → nessuna collisione
Esito: `PASS` con anomalia nota (vedi Anomalia 1)
Note:
- la cattura funziona per tutti i target provati
- i metadata sidecar JSON risultano popolati correttamente
- la collisione di naming è riproducibile e confermata
- il workaround con `path` esplicito risolve completamente la collisione
- il campo `summary` nel risultato contiene path, dimensioni, tipo widget e renderer
### 8. Errori e Guardrail (T8)
Scenari eseguiti:
- `exec_command(..., auto_submit=True)` con shell syntax bloccata: `ls | head`
- `exec_command(..., auto_submit=True)` con shell syntax bloccata: `echo hi && echo bye`
- `exec_command(..., auto_submit=True)` con comando vietato: `python3 -c 'print(1)'`
- `exec_command(..., auto_submit=True)` con comando non in whitelist: `jq ...`
- `focus_tab(...)` su `tab_id` inesistente
- `read_tab(...)` su `tab_id` inesistente
- `close_tab(...)` su `tab_id` inesistente
Esito: `PASS`
Note:
- i messaggi d'errore sono coerenti con le regole dei guardrail
- gli errori vengono sollevati prima del submit, non dopo
- non sono emersi stati corrotti del terminale dopo gli errori lato tool
### 9. Comportamento open_tab(command=...) (T9)
Scenari eseguiti:
- apertura tab con `open_tab(command="uname -r")` (comando terminante)
- verifica con `read_last_command_result(...)` dopo ~500 ms
- apertura tab con `open_tab(command="bash --norc")` (sub-shell persistente)
- tentativo di `read_last_command_result(...)` nella tab con sub-shell
Esito: `PASS` con distinzione critica
Note:
- **comandi terminanti** (`uname -r`, `ls /tmp`, ecc.) iniettati via `command=` **sono tracciati** come
`sequence=1` con i normali metadati command event
- **sub-shell persistenti** (`bash`, `ssh`, `python3` REPL) avviate via `command=` **non sono tracciate**:
la sub-shell non eredita l'hook MCP, quindi `read_last_command_result` solleva RuntimeError
- questa distinzione era documentata in modo errato nel docstring di `read_last_command_result` — corretta durante il test
### 10. Submit Manuale (T11)
Scenari eseguiti:
- `exec_command(tab_id, "echo 'ciao dal test'", auto_submit=False)` → ritorno bloccante
- utente preme Invio nella GUI dopo 2–3 secondi
- verifica che `exec_command` ritorni `submitted_manually=true`
- verifica campo `newline_ignored` nel risultato
- chiamata a `wait_for_command_result(tab_id, after_sequence=...)` sul risultato
Esito: `PASS`
Note:
- `exec_command` rimane in attesa bloccante fino a che l'utente non preme Invio nel terminale
- il campo `submitted_manually=true` identifica correttamente la modalità
- `after_sequence` è disponibile anche in modalità manuale per filtrare il risultato atteso
- `newline_ignored=false` in modalità manuale (il newline è parte dell'azione utente)
### 11. Attesa Comando Lungo con wait_for_running_command (T12)
Scenari eseguiti:
- utente digita e invia `sleep 4` nella GUI manualmente
- chiamata immediata a `wait_for_running_command(tab_id)` dal tool
- verifica che il tool attenda il completamento del sleep
- verifica metadati nel risultato: `duration_seconds` ≥ 4
Esito: `PASS`
Note:
- `wait_for_running_command` rileva correttamente lo stato `running` e aspetta il completamento
- il campo `duration_seconds` riflette il tempo effettivo di esecuzione
- questo scenario non ha `after_sequence`, dimostrando il caso d'uso corretto di `wait_for_running_command`
### 12. Sessioni Delegate — in_delegated_session (T13 + S1–S7)
Questa sezione copre la sub-shell avviata manualmente e il flag `in_delegated_session`.
#### T13 — Primo scenario sub-shell manuale
Scenari eseguiti:
- `exec_command(tab_id, "bash --norc", auto_submit=False)` con utente che preme Invio → sub-shell attiva
- verifica che lo stato del tab diventi `interactive-session`
- iniezione di `echo test` via `exec_command(tab_id, "echo test", auto_submit=False)` nella sub-shell
- verifica che la risposta contenga `in_delegated_session: true`
Esito: `PASS`
Note:
- il flag `in_delegated_session` è presente nella risposta sia per `auto_submit=True` che `False`
- l'iniezione nella sub-shell funziona a livello VTE ma non produce command events tracciati
#### Campagna sub-shell S1–S7
Scenari eseguiti:
- **S1**: apertura tab fresca, submit manuale di `bash --norc` via GUI
Verifica: `running_command_status(tab_id)` riporta `state="interactive-session"``PASS`
- **S2**: `exec_command(tab_id, "echo test_s2", auto_submit=False)` nella sub-shell attiva,
utente preme Invio
Verifica: risposta contiene `in_delegated_session: true``PASS`
- **S3**: `read_tab(tab_id)` sulla stessa tab
Verifica: output VTE grezzo contiene `test_s2``PASS`
- **S4**: `wait_for_running_command(tab_id)` → RuntimeError immediato per stato `interactive-session`
`wait_for_command_result(tab_id, after_sequence=..., timeout=3)` → RuntimeError per timeout
Verifica: entrambi falliscono come previsto; il percorso corretto è `read_tab` + `wait_for_prompt``PASS`
- **S5**: utente digita `exit` nella sub-shell → ritorno alla shell parent MCP
`wait_for_prompt(tab_id, timeout=10)` → rilevamento del prompt parent
Verifica: `state="prompt"`, `last_line` contiene il prompt della shell MCP → `PASS`
- **S6**: `exec_command(tab_id, "whoami", auto_submit=True)` dopo exit dalla sub-shell
Verifica: funziona normalmente, `in_delegated_session: false``PASS`
Nota: intervento manuale accidentale nella GUI ha aggiunto testo spurio, rimosso con Ctrl+C
- **S7**: pulizia — `close_tab(tab_id)`
Verifica: tab chiusa correttamente → `PASS`
Esito complessivo: `PASS`
Note tecniche:
- la GUI traccia le sub-shell in `TabState` con campi `delegated_session_*`
- `rpc_exec` in `gui.py` rileva `state="interactive-session"` e imposta `pending_submit_in_delegated_session=True`
- il flag `in_delegated_session` viene propagato in entrambi i path di `exec_command` in `server.py`
- `wait_for_running_command` solleva immediatamente per `interactive-session` (comportamento corretto)
- `wait_for_prompt` è il tool di osservazione raccomandato nelle sessioni delegate
## Anomalie Osservate
### 1. Collisione Naming Screenshot
Stato: **APERTA** (workaround disponibile)
Sintomo:
- catture con `target="tab"` e `target="tab-container"` sullo stesso `tab_id` entro lo stesso secondo
producono lo stesso path di default; la seconda sovrascrive la prima silenziosamente
Impatto:
- un file PNG o JSON può sovrascrivere l'altro
Workaround confermato:
- passare `path` esplicito a entrambe le chiamate risolve completamente il problema — verificato in T7/T8
Area coinvolta:
- generazione path di default screenshot in `gui.py`
### 2. Tracciamento Startup Command Interattivi
Stato: **PARZIALMENTE RISOLTA**
Sintomo originale:
- `open_tab(command="bash")` produceva output ma non rendeva osservabile il prompt né i command events
Aggiornamento post-T9:
- **comandi terminanti** (`uname -r`, `ls`, ecc.) iniettati via `open_tab(command=...)` **sono confermati come tracciati**
correttamente (`sequence=1`, `exit_code=0`)
- **sub-shell persistenti** (`bash`, `python3 REPL`, `ssh`) avviate via `open_tab(command=...)` rimangono non tracciate
perché la sub-shell non eredita l'hook MCP
Impatto residuo:
- solo il caso sub-shell persistente via `open_tab(command=...)` è ancora non tracciato
- il caso più utile (sub-shell avviata via `exec_command` manuale) è completamente supportato
tramite il flag `in_delegated_session` + `read_tab` + `wait_for_prompt`
### 3. Timestamps di Esecuzione da Ricontrollare
Stato: **APERTA** (impatto basso in condizioni normali)
Sintomo:
- alcuni `started_at` e `duration_seconds` osservati sembrano partire prima del submit effettivo percepito
Impatto:
- i metadati temporali potrebbero risultare meno affidabili del previsto nei report di comando
Area coinvolta:
- hook shell che scrive transcript ed eventi
## Casi Ancora da Testare Manualmente
Tutti i casi elencati nella versione precedente di questo documento sono stati coperti:
- `exec_command(..., auto_submit=False)` con modifica del comando → **coperto in T11**
- `newline=True` ignorato in modalità manuale → **confermato in T11** (`newline_ignored` nel risultato)
- sessioni delegate (sub-shell, SSH) avviate con submit manuale → **coperto in T13 + S1–S7**
- uso combinato di `wait_for_prompt(...)` e `wait_for_running_command(...)` in sessioni interattive → **coperto in S4+S5**
Nessun caso rilevante rimane da testare nelle condizioni attuali.
## Stato Finale Ambiente di Test
Pulizia eseguita dopo ogni campagna:
- chiusura delle tab create per i test
- verifica finale con sola tab Home presente o GUI spenta
Esito cleanup: `PASS` per tutte le campagne
## Conclusione
La maggior parte dei tool MCP testati risulta funzionante nei casi non interattivi e nei flussi standard di osservazione del terminale. Le sessioni delegate (sub-shell, SSH) sono ora supportate tramite il flag `in_delegated_session` nell'output di `exec_command` e l'uso combinato di `read_tab` + `wait_for_prompt`.
I problemi residui aperti sono due: la collisione di naming screenshot (workaround disponibile con `path` esplicito) e la piccola incertezza sui timestamp di esecuzione.

30
src/gnome_vte_mcp/gui.py

@ -67,6 +67,7 @@ class TabState:
delegated_session_command: str | None
delegated_session_started_at: str | None
delegated_session_after_sequence: int | None
pending_submit_in_delegated_session: bool = False
class TerminalWindow(Gtk.ApplicationWindow):
@ -519,6 +520,7 @@ class TerminalApp(Gtk.Application):
delegated_session_command=None,
delegated_session_started_at=None,
delegated_session_after_sequence=None,
pending_submit_in_delegated_session=False,
)
self.tabs[tab_id] = tab
show_onboarding = len(self.tabs) == 1 and command is None
@ -530,7 +532,7 @@ class TerminalApp(Gtk.Application):
self._write_shell_rc_file(tab)
self._spawn_shell(tab, cwd=working_directory)
if command:
GLib.timeout_add(150, self._feed_terminal_input, tab_id, command + "\n")
GLib.timeout_add(150, self._feed_terminal_input, tab_id, command, True)
self.window.present()
return self._serialize_tab(tab)
@ -649,11 +651,16 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
None,
)
def _feed_terminal_input(self, tab_id: str, payload: str) -> bool:
def _send_terminal_enter(self, terminal: Vte.Terminal) -> None:
terminal.feed_child(b"\r")
def _feed_terminal_input(self, tab_id: str, payload: str, submit: bool = False) -> bool:
tab = self.tabs.get(tab_id)
if tab is None:
return False
tab.terminal.paste_text(payload)
if submit:
self._send_terminal_enter(tab.terminal)
return False
def _looks_like_interactive_handoff(self, command: str) -> bool:
@ -723,7 +730,8 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
raise RuntimeError(f"tab_id={tab.tab_id} has no pending submit to finalize")
after_sequence = self._latest_command_sequence(tab)
delegated = self._looks_like_interactive_handoff(tab.pending_submit_text or "")
delegated = tab.pending_submit_in_delegated_session or self._looks_like_interactive_handoff(tab.pending_submit_text or "")
tab.pending_submit_in_delegated_session = False
tab.last_manual_submit_id = tab.pending_submit_id
tab.last_manual_submit_text = tab.pending_submit_text
tab.last_manual_submit_at = submitted_at
@ -862,7 +870,11 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
try:
return self.tabs[tab_id]
except KeyError as exc:
raise RuntimeError(f"Unknown tab_id: {tab_id}") from exc
raise RuntimeError(
f"Unknown tab_id: {tab_id!r}. "
"Call open_tab() to create a tab and get a valid tab_id, "
"or list_tabs() to see existing tab IDs."
) from exc
def _sanitize_text(self, text: str) -> str:
cleaned = OSC_ESCAPE_RE.sub("", text)
@ -1201,8 +1213,7 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
)
if running_status["state"] == "completed":
self._clear_current_execution(tab)
if running_status["state"] == "interactive-session":
self._clear_current_execution(tab)
in_delegated_session = running_status["state"] == "interactive-session"
if tab.pending_submit_id is not None:
raise RuntimeError(
f"tab_id={tab_id} already has a pending manual submit request; press Enter in the GUI before writing another command"
@ -1219,8 +1230,11 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
tab.pending_submit_id = submission_id
tab.pending_submit_text = command
tab.pending_submit_requested_at = requested_at
if in_delegated_session:
tab.pending_submit_in_delegated_session = True
if auto_submit:
tab.terminal.paste_text(payload + "\n")
tab.terminal.paste_text(payload)
self._send_terminal_enter(tab.terminal)
submitted_at = datetime.now().astimezone().isoformat(timespec="milliseconds")
after_sequence = self._finalize_submission(tab, submitted_at=submitted_at)
return {
@ -1233,6 +1247,7 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
"submitted_at": submitted_at,
"current_sequence": self._latest_command_sequence(tab),
"newline_ignored": False,
"in_delegated_session": in_delegated_session,
}
tab.terminal.paste_text(payload)
@ -1244,6 +1259,7 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
"submission_id": submission_id,
"requested_at": requested_at,
"newline_ignored": newline,
"in_delegated_session": in_delegated_session,
}
def rpc_manual_submit_status(self, tab_id: str, submission_id: str) -> dict[str, Any]:

281
src/gnome_vte_mcp/server.py

@ -20,7 +20,23 @@ from mcp.server.fastmcp import FastMCP
from .control import ControlError, default_socket_path, request, wait_for_socket
mcp = FastMCP("mcp-hal9002")
mcp = FastMCP(
"mcp-hal9002",
instructions=(
"This server controls a live GTK/VTE terminal on the user's desktop. "
"Call exec_command() to run commands directly — do not ask the user for permission first.\n\n"
"Standard workflow:\n"
"1. exec_command('command', auto_submit=True) → runs command immediately (tab is auto-created)\n"
"2. wait_for_command_result(tab_id, after_sequence=result['after_sequence']) → returns output, exit_code, cwd\n\n"
"tab_id is optional in exec_command(). When omitted, the first open tab is used automatically "
"(or a new tab is created). Only call open_tab() explicitly when you need a specific tab.\n\n"
"auto_submit=True is allowed for these read-only commands:\n"
" cat date df du echo env file find free grep head hostname id ip ls\n"
" netstat pgrep printenv ps pwd readlink rg ss stat tail uname uptime whoami\n\n"
"All other commands (bash, python3, sudo, rm, cp, ssh, …) require auto_submit=False: "
"the text is pasted into the terminal and the user must press Enter in the GUI to confirm."
),
)
COMMAND_START_MARKER = "__MCP_HAL9002_CMD_START__"
COMMAND_END_MARKER = "__MCP_HAL9002_CMD_END__"
ANSI_ESCAPE_RE = re.compile(r"\x1b\[[0-?]*[ -/]*[@-~]")
@ -393,7 +409,12 @@ def call_gui(method: str, *, ensure_running: bool = True, **params: Any) -> Any:
@mcp.tool()
def gui_status() -> dict[str, Any]:
"""Return GUI lifecycle state without starting a new window."""
"""Return GUI lifecycle state without starting a new window.
Returns ``running=true`` plus window geometry and tab count when the GUI
process is active, or ``running=false`` when it has not been started yet.
Safe to call at any time; never auto-launches the GUI.
"""
socket_path = _socket_path()
if not _is_gui_running(socket_path):
return {
@ -408,14 +429,26 @@ def gui_status() -> dict[str, Any]:
@mcp.tool()
def open_gui() -> dict[str, Any]:
"""Start the GUI if needed and present the shared window."""
"""Start the GUI if not already running and present the shared window.
The GUI is a single shared instance; calling this when it is already running
is idempotent (brings the window to the foreground). To get a usable
terminal tab in one step, call ``open_tab()`` instead it auto-launches
the GUI if needed.
"""
ensure_gui()
return call_gui("show_window")
@mcp.tool()
def close_gui() -> dict[str, Any]:
"""Close the shared GUI window if it is currently running."""
"""Close the shared GUI window and terminate the GUI process.
All open tabs are destroyed. Subsequent tool calls that require the GUI
will auto-relaunch it as a new process. Returns ``closed=true`` if the
process was running; returns ``closed=false`` without error when it was
already stopped.
"""
socket_path = _socket_path()
if not _is_gui_running(socket_path):
return {
@ -431,39 +464,128 @@ def close_gui() -> dict[str, Any]:
@mcp.tool()
def open_tab(title: str | None = None, cwd: str | None = None, command: str | None = None) -> dict[str, Any]:
"""Open a new terminal tab in the GTK/VTE prototype window."""
"""Open a new terminal tab and return its ``tab_id``.
Parameters
----------
title:
Label shown on the tab header.
cwd:
Working directory for the shell. Defaults to the user home directory.
command:
Shell command injected ~150 ms after the terminal spawns.
Commands injected here that **terminate on their own** (e.g.,
``uname -r``, ``ls``) are tracked normally and visible via
``read_last_command_result()`` and ``wait_for_command_result()``.
Commands that start a **persistent interactive sub-shell** (e.g.,
``bash``, ``ssh``, ``python3`` REPL) are NOT tracked: the sub-shell
inherits neither the MCP shell hook nor its prompt markers, so
``read_last_command_result()`` will raise and ``wait_for_prompt()``
may time out.
For a tracked interactive session opened with a typed command, open
the tab first and then call ``exec_command()`` without auto_submit.
The GUI is auto-launched if not already running.
"""
return call_gui("open_tab", title=title, cwd=cwd, command=command)
@mcp.tool()
def list_tabs() -> list[dict[str, Any]]:
"""List the currently known tabs managed by the prototype GUI."""
"""List all currently open tabs managed by the GUI.
Returns a list of dicts, each with ``tab_id``, ``title``, ``cwd``, and
``active`` (whether it is the focused tab). Use tab_ids from this list
with all other tab-scoped tools.
"""
return call_gui("list_tabs")
@mcp.tool()
def focus_tab(tab_id: str) -> dict[str, Any]:
"""Bring a specific tab to the foreground in the prototype GUI."""
"""Bring a specific tab to the foreground in the GUI window.
Does not affect command execution or tracking. Useful before a screenshot
to ensure the desired tab is visible in ``target="window"`` captures.
"""
return call_gui("focus_tab", tab_id=tab_id)
@mcp.tool()
def exec_command(
tab_id: str,
command: str,
tab_id: str | None = None,
newline: bool = True,
auto_submit: bool = False,
poll_interval: float = 0.1,
) -> dict[str, Any]:
"""Write command text into an existing tab and block until the user manually presses Enter in the GUI.
By default the tool blocks indefinitely until manual submission is detected, not after command completion.
When `auto_submit=True`, it immediately sends Enter too, but only for small read-only commands.
Use wait_for_command_result() to wait for the command to finish and collect output.
"""Inject command text into a terminal tab and optionally auto-submit it.
``tab_id`` is optional. When omitted (or None), the first open tab is used
automatically. If no tab exists yet, one is created on the fly. Pass an
explicit ``tab_id`` (from ``open_tab()`` or ``list_tabs()``) only when you
need to target a specific tab among several.
Default mode (``auto_submit=False``):
Paste the command text and block indefinitely until the user manually
presses Enter in the GUI window. The ``newline`` parameter controls
whether a newline hint is appended to the pasted text in the input line.
Returns ``after_sequence`` pass this to ``wait_for_command_result()``
to await the result of exactly this command.
``auto_submit=True`` mode:
Immediately paste the text and send an Enter keystroke without any human
interaction. Allowed only for a short whitelist of read-only commands:
cat, date, df, du, echo, env, file, find, free, grep, head, hostname,
id, ip, ls, netstat, pgrep, printenv, ps, pwd, readlink, rg, ss, stat,
tail, uname, uptime, whoami.
Shell metacharacters (; && | > < ` $() are blocked. Commands such as
bash, python3, sudo, rm, cp, mv, ssh are also blocked.
The ``newline`` parameter is ignored when ``auto_submit=True``.
Typical pattern for auto_submit (no tab setup needed):
1. result = exec_command("pwd", auto_submit=True)
2. outcome = wait_for_command_result(result["tab_id"],
after_sequence=result["after_sequence"])
For a command that has already been manually submitted and is still
running, use ``wait_for_running_command()`` instead.
Delegated interactive sessions (sub-shell, SSH, REPL):
``exec_command()`` can still inject text into a delegated session
the response will include ``in_delegated_session: true``. In that state
tracked command-completion events are NOT emitted by the sub-shell, so
do NOT call ``wait_for_command_result()`` or ``wait_for_running_command()``
afterward. Instead use ``read_tab()`` to observe output and
``wait_for_prompt()`` to detect when the session returns to the parent
shell, or ``wait_for_command()`` for a fixed sleep.
Returns
-------
tab_id (str), written_text (str), submission_id (str), requested_at (str),
submitted_at (str), submitted_manually (bool), after_sequence (int the
minimum sequence required by ``wait_for_command_result`` to target this
specific command), current_sequence (int), newline_ignored (bool),
in_delegated_session (bool).
Example auto_submit two-step flow
------------------------------------
r = exec_command("ls /tmp", auto_submit=True)
# r["in_delegated_session"] is False → safe to proceed
out = wait_for_command_result(r["tab_id"], after_sequence=r["after_sequence"])
# out["exit_code"] == 0, out["text"] contains directory listing
"""
if auto_submit:
_auto_submit_guard(command)
if tab_id is None:
tabs = call_gui("list_tabs")
if tabs:
tab_id = tabs[0]["tab_id"]
else:
new_tab = call_gui("open_tab", title=None, cwd=None, command=None)
tab_id = new_tab["tab_id"]
result = call_gui("exec", tab_id=tab_id, command=command, newline=newline, auto_submit=auto_submit)
if auto_submit:
return {
@ -476,6 +598,7 @@ def exec_command(
"after_sequence": result["after_sequence"],
"current_sequence": result["current_sequence"],
"newline_ignored": False,
"in_delegated_session": result.get("in_delegated_session", False),
}
submitted = _wait_for_manual_submit(
@ -493,14 +616,19 @@ def exec_command(
"after_sequence": result["after_sequence"],
"current_sequence": submitted["current_sequence"],
"newline_ignored": result["newline_ignored"],
"in_delegated_session": result.get("in_delegated_session", False),
}
@mcp.tool()
def wait_for_command(delay_seconds: float) -> dict[str, Any]:
"""Block synchronously for a fixed amount of time.
"""Sleep for a fixed number of seconds (simple timer helper).
This is a simple sleep helper for workflows driven by repeated `read_tab()` calls.
This tool does NOT monitor any terminal event or command state it is a
plain sleep. Use it only as a last-resort delay between ``read_tab()``
polls when the more structured ``wait_for_command_result()`` or
``wait_for_running_command()`` cannot be applied (e.g., an arbitrary
interactive process with no tracked command events).
"""
if delay_seconds < 0:
raise RuntimeError("delay_seconds must be >= 0")
@ -520,13 +648,43 @@ def wait_for_command(delay_seconds: float) -> dict[str, Any]:
@mcp.tool()
def read_tab(tab_id: str, last_n_lines: int = 200) -> dict[str, Any]:
"""Read the trailing scrollback text from a tab."""
"""Return the trailing scrollback text from a tab as a raw string.
Returns up to ``last_n_lines`` lines of ANSI-stripped terminal transcript.
The output is raw and mixes prompts, echoed input, and command output.
PREFER THIS WHEN:
- the tab is inside a delegated session (``in_delegated_session=true``)
where tracked command events are unavailable
- you want raw terminal state or partial output while a command runs
PREFER ``read_last_command_result()`` WHEN you need structured metadata:
exit_code, timing, cwd, or clean isolated output for a completed command.
"""
return call_gui("read_tab", tab_id=tab_id, last_n_lines=last_n_lines)
@mcp.tool()
def read_last_command_result(tab_id: str) -> dict[str, Any]:
"""Read the last completed command result for a tab, including cwd and execution timestamps."""
"""Return the most recent tracked command result for a tab.
Returns a dict with: command, cwd, cwd_after, started_at, finished_at,
duration_seconds, exit_code, and text (the captured output between the
shell hook start and end markers, clean of prompts and echoed input).
Commands tracked here include those submitted via ``exec_command()`` and
terminating startup commands injected via ``open_tab(command=...)``.
Persistent interactive sub-shells started via ``open_tab(command="bash")``
are NOT tracked (they inherit no MCP shell hook).
Raises RuntimeError when no tracked command has completed yet for this tab.
PREFER THIS over ``read_tab()`` when you need structured metadata:
exit_code, timing, cwd, or clean isolated output for a completed command.
PREFER ``read_tab()`` when the tab is in a delegated session
(``in_delegated_session=true``) or when you need the raw scrollback.
To wait for a specific future command, use ``wait_for_command_result()``
with the ``after_sequence`` from ``exec_command()``.
"""
socket_path = ensure_gui()
event = _latest_command_event(socket_path, tab_id)
if event is None:
@ -536,9 +694,24 @@ def read_last_command_result(tab_id: str) -> dict[str, Any]:
@mcp.tool()
def wait_for_command_result(tab_id: str, after_sequence: int | None = None, timeout: float | None = None, poll_interval: float = 0.1) -> dict[str, Any]:
"""Block until the next completed command is observed in a tab and return its tracked result.
"""Block until the next tracked command completes and return its result.
CHOOSE THIS TOOL when you called ``exec_command()`` and have
``after_sequence`` it targets the exact command you submitted.
CHOOSE ``wait_for_running_command()`` when a command was manually
submitted and you have no ``after_sequence``.
Pass ``after_sequence=result["after_sequence"]`` from ``exec_command()``
to target the specific command you just submitted. When ``after_sequence``
is omitted the current latest sequence is used as the baseline.
When `timeout` is omitted or set to a non-positive value, wait indefinitely.
``timeout`` <= 0 or None means wait indefinitely. Raises RuntimeError on
timeout or if the tab does not exist.
Returns: tab_id (str), sequence (int), command (str), cwd (str),
cwd_after (str), started_at (str), finished_at (str),
duration_seconds (float), exit_code (int), text (str clean command
output between the shell hook start and end markers).
"""
socket_path = ensure_gui()
baseline = after_sequence
@ -558,10 +731,24 @@ def wait_for_command_result(tab_id: str, after_sequence: int | None = None, time
@mcp.tool()
def wait_for_running_command(tab_id: str, timeout: float | None = None, poll_interval: float = 0.1) -> dict[str, Any]:
"""Block until the command currently executing in a tab finishes and return its tracked result.
"""Block until the command currently executing in a tab finishes.
CHOOSE THIS TOOL when a command was manually submitted and you have no
``after_sequence``.
CHOOSE ``wait_for_command_result()`` when you called ``exec_command()``
and have ``after_sequence`` it targets the specific command precisely.
Use this when a command has already been manually submitted and the
terminal is still busy, but you do not have the ``after_sequence`` from
``exec_command()``.
Raises RuntimeError immediately if the tab is idle and no prior running
command was observed in this call. Also raises if the tab is inside an
interactive delegated session (e.g., SSH or a nested shell) tracked
command completion is unavailable in that state; use ``wait_for_prompt()``
instead.
Use this when a command has already been manually submitted and the terminal is still busy.
When `timeout` is omitted or set to a non-positive value, wait indefinitely.
``timeout`` <= 0 or None means wait indefinitely.
"""
socket_path = ensure_gui()
deadline = None if timeout is None or timeout <= 0 else time.monotonic() + timeout
@ -602,10 +789,27 @@ def wait_for_prompt(
idle_seconds: float = 0.4,
prompt_pattern: str | None = None,
) -> dict[str, Any]:
"""Block until terminal output becomes idle and the trailing line looks like a shell prompt.
This is intended for delegated interactive sessions such as SSH, where tracked command markers
are unavailable. When `prompt_pattern` is omitted, a conservative default prompt regex is used.
"""Block until terminal output goes idle and the last line looks like a shell prompt.
Intended for delegated interactive sessions (e.g., SSH, remote shells)
where tracked command-completion events are unavailable. Detection is
heuristic: the transcript log must be stable for at least ``idle_seconds``
and the last non-empty line must match ``prompt_pattern``. The default
pattern matches common endings like ``$ ``, ``# ``, ``> ``, ``]# ``.
LIMITATION: if ``open_tab(command="bash")`` was used to start a sub-shell,
the shell hook markers may not be injected into it, and this tool may time
out. In that case provide an explicit ``prompt_pattern`` that matches the
sub-shell's prompt, or open the interactive session by manually submitting
via ``exec_command()`` without auto_submit.
``timeout`` <= 0 or None means wait indefinitely.
Raises RuntimeError:
``"No transcript log exists yet for tab_id: {tab_id}"`` the tab
transcript has not been initialised yet (tab too new or missing).
``"Timed out waiting for a visible prompt on tab_id=..."`` the
timeout elapsed before a stable prompt line was detected.
"""
socket_path = ensure_gui()
log_path = _tab_log_path(socket_path, tab_id)
@ -642,7 +846,12 @@ def wait_for_prompt(
@mcp.tool()
def close_tab(tab_id: str) -> dict[str, Any]:
"""Close a tab in the prototype GUI."""
"""Close and destroy a terminal tab by its tab_id.
The tab_id is permanently removed; any subsequent calls referencing it
will fail. Obtain valid tab_ids from ``list_tabs()`` or from the return
value of ``open_tab()``.
"""
return call_gui("close_tab", tab_id=tab_id)
@ -653,7 +862,23 @@ def capture_screenshot(
path: str | None = None,
diagnostic_overlay: bool = False,
) -> dict[str, Any]:
"""Capture a PNG screenshot of the full window, the VTE content of a tab, or a tab container and return JSON-serializable metadata."""
"""Capture a PNG screenshot and return path and metadata.
``target`` values:
"window" the entire shared GUI window (tab_id not required)
"tab" only the VTE terminal content area of a tab
"tab-container" the tab widget including its header bar
``tab_id`` is required for target="tab" and target="tab-container".
``path`` overrides the default output path inside the screenshot directory;
if omitted a timestamped name is generated automatically.
``diagnostic_overlay`` draws a semi-transparent info panel on the image.
KNOWN LIMITATION: capturing both target="tab" and target="tab-container"
for the same tab_id within the same second will produce the same default
filename and the second capture will silently overwrite the first. Pass an
explicit ``path`` to each call to avoid this.
"""
result = call_gui(
"capture_screenshot",
target=target,

Loading…
Cancel
Save