feat: Enhance server functionality with detailed command execution instructions and improve test report documentation

3 months ago · 37e7517b62
3 changed files with 659 additions and 35 deletions
--- a/TEST_REPORT.md
+++ b/TEST_REPORT.md
@ -0,0 +1,383 @@
+# mcp-hal9002 Test Report
+
+Data: 15 marzo 2026 — aggiornato al 16 marzo 2026
+
+## Obiettivo
+
+Tracciare gli scenari di test eseguiti sui tool MCP di `mcp-hal9002`, con esito, note operative e anomalie osservate. Questo documento viene aggiornato a ogni campagna di test significativa.
+
+## Ambito Coperto
+
+Tool coperti durante le sessioni:
+
+- `gui_status()`
+- `open_gui()` implicito tramite `open_tab(...)`
+- `close_gui()`
+- `open_tab(...)`
+- `list_tabs()`
+- `focus_tab(...)`
+- `exec_command(...)`
+- `read_tab(...)`
+- `read_last_command_result(...)`
+- `wait_for_command(...)`
+- `wait_for_running_command(...)`
+- `wait_for_command_result(...)`
+- `wait_for_prompt(...)`
+- `capture_screenshot(...)`
+- `close_tab(...)`
+
+## Matrice Scenari
+
+### 1. Lifecycle GUI (T1)
+
+Stato iniziale verificato con GUI già in esecuzione e tab residue da test precedenti.
+
+Scenari eseguiti:
+
+- chiusura completa della GUI condivisa con `close_gui()`
+- verifica di `running=false` dopo `close_gui()`
+- riavvio implicito della GUI tramite `open_tab(...)`
+- verifica del riuso dell'istanza condivisa via socket locale
+- `gui_status()` con GUI spenta → `running=false`
+- `gui_status()` con GUI attiva → campi geometria e `tab_count` presenti
+
+Esito: `PASS`
+
+Note:
+
+- la GUI si spegne correttamente e il socket viene rimosso
+- una nuova `open_tab(...)` rilancia la GUI come previsto
+- `gui_status()` non lancia la GUI — comportamento corretto
+
+### 2. Lifecycle Tab (T2)
+
+Scenari eseguiti:
+
+- apertura di più tab con `title` differenti
+- apertura di tab con `cwd` esplicito
+- verifica elenco tab con `list_tabs()`
+- cambio tab attiva con `focus_tab(...)`
+- chiusura di una tab intermedia con `close_tab(...)`
+- verifica consistenza degli ID residui dopo la chiusura
+- `focus_tab(...)` su `tab_id` inesistente → RuntimeError atteso
+
+Esito: `PASS`
+
+Note:
+
+- gli ID tab restano coerenti dopo chiusure intermedie
+- lo stato `active` segue correttamente la tab focalizzata
+- gli errori su `tab_id` inesistente sono uniformi tra tutti i tool tab-scoped
+
+### 3. Esecuzione Comandi con Auto Submit (T3)
+
+Scenari eseguiti:
+
+- `exec_command(tab_id, "pwd", auto_submit=True)` su tab in `cwd=/home/enne2`
+- `exec_command(tab_id, "pwd", auto_submit=True)` su tab in `cwd=/home/enne2/dev/mcp-hal9002`
+- `exec_command(tab_id, "whoami", auto_submit=True)`
+- `exec_command(tab_id, "ls /definitely-missing-path", auto_submit=True)`
+- `exec_command(tab_id, "uname -r", auto_submit=True)`
+
+Esito: `PASS`
+
+Note:
+
+- `auto_submit=True` invia il comando e attende solo la sua conclusione
+- il `cwd` osservato nei risultati coincide con la directory della tab
+- gli errori runtime del comando (`exit_code=1`) sono tracciati correttamente
+- `after_sequence` restituito è pronto per passare a `wait_for_command_result`
+
+### 4. Pattern after_sequence (T4)
+
+Scenari eseguiti:
+
+- submit di `ls /tmp` con `auto_submit=True`, salvataggio di `after_sequence`
+- chiamata a `wait_for_command_result(tab_id, after_sequence=...)` con il valore ottenuto
+- submit di un secondo comando e verifica che `after_sequence` diverso isoli risultati separati
+
+Esito: `PASS`
+
+Note:
+
+- `after_sequence` permette di isolare esattamente il comando appena emesso
+- i risultati includono `sequence`, `command`, `cwd`, `cwd_after`, `exit_code`, `duration_seconds`, `text`
+- il campo `text` contiene l'output catturato tra i marker shell, pulito da prompt e echo
+
+### 5. Lettura Output e Risultati (T5)
+
+Scenari eseguiti:
+
+- lettura scrollback grezzo con `read_tab(...)` dopo un `exec_command`
+- lettura ultimo risultato tracciato con `read_last_command_result(...)`
+- attesa del risultato con `wait_for_command_result(...)`
+- confronto metadati tra `read_last_command_result(...)` e `wait_for_command_result(...)`
+- distinzione uso: `read_tab` per output grezzo, `read_last_command_result` per metadati strutturati
+
+Esito: `PASS`
+
+Note:
+
+- `read_tab` restituisce testo grezzo con prompt e echo: utile per debug e sessioni delegate
+- `read_last_command_result` fornisce output pulito tra i marker hook: utile per parsing strutturato
+- i due metodi convergono sugli stessi metadati per lo stesso comando completato
+- `read_tab` è l'unica opzione pratica nelle sessioni delegate dove i command events non vengono emessi
+
+### 6. Attese e Timeout (T6)
+
+Scenari eseguiti:
+
+- pausa esplicita con `wait_for_command(delay_seconds=2.0)`
+- timeout breve su comando lungo con `wait_for_running_command(tab_id, timeout=0.5)` → RuntimeError atteso
+- `wait_for_running_command(tab_id)` su tab idle → RuntimeError "no tracked command is currently running"
+- attesa lunga di completamento sullo stesso comando in esecuzione
+- attesa di risultato con `after_sequence` corretto su `du -ah /usr`
+
+Esito: `PASS`
+
+Note:
+
+- `wait_for_running_command` su tab idle fallisce immediatamente come previsto
+- il timeout breve fallisce correttamente mentre il comando è ancora in corso
+- l'attesa lunga porta al completamento atteso
+- `wait_for_running_command(...)` e `wait_for_command_result(...)` restituiscono metadati coerenti sullo stesso evento finale
+
+### 7. Screenshot — Target e Naming (T7 / T8)
+
+Scenari eseguiti:
+
+- screenshot `target="window"` con overlay diagnostico
+- screenshot `target="tab"` con overlay diagnostico
+- screenshot `target="tab-container"` con overlay diagnostico
+- entrambi `target="tab"` e `target="tab-container"` sullo stesso `tab_id`, stesso secondo → collisione naming confermata
+- stesso scenario con `path` esplicito per ciascuna chiamata → nessuna collisione
+
+Esito: `PASS` con anomalia nota (vedi Anomalia 1)
+
+Note:
+
+- la cattura funziona per tutti i target provati
+- i metadata sidecar JSON risultano popolati correttamente
+- la collisione di naming è riproducibile e confermata
+- il workaround con `path` esplicito risolve completamente la collisione
+- il campo `summary` nel risultato contiene path, dimensioni, tipo widget e renderer
+
+### 8. Errori e Guardrail (T8)
+
+Scenari eseguiti:
+
+- `exec_command(..., auto_submit=True)` con shell syntax bloccata: `ls | head`
+- `exec_command(..., auto_submit=True)` con shell syntax bloccata: `echo hi && echo bye`
+- `exec_command(..., auto_submit=True)` con comando vietato: `python3 -c 'print(1)'`
+- `exec_command(..., auto_submit=True)` con comando non in whitelist: `jq ...`
+- `focus_tab(...)` su `tab_id` inesistente
+- `read_tab(...)` su `tab_id` inesistente
+- `close_tab(...)` su `tab_id` inesistente
+
+Esito: `PASS`
+
+Note:
+
+- i messaggi d'errore sono coerenti con le regole dei guardrail
+- gli errori vengono sollevati prima del submit, non dopo
+- non sono emersi stati corrotti del terminale dopo gli errori lato tool
+
+### 9. Comportamento open_tab(command=...) (T9)
+
+Scenari eseguiti:
+
+- apertura tab con `open_tab(command="uname -r")` (comando terminante)
+- verifica con `read_last_command_result(...)` dopo ~500 ms
+- apertura tab con `open_tab(command="bash --norc")` (sub-shell persistente)
+- tentativo di `read_last_command_result(...)` nella tab con sub-shell
+
+Esito: `PASS` con distinzione critica
+
+Note:
+
+- **comandi terminanti** (`uname -r`, `ls /tmp`, ecc.) iniettati via `command=` **sono tracciati** come
+  `sequence=1` con i normali metadati command event
+- **sub-shell persistenti** (`bash`, `ssh`, `python3` REPL) avviate via `command=` **non sono tracciate**:
+  la sub-shell non eredita l'hook MCP, quindi `read_last_command_result` solleva RuntimeError
+- questa distinzione era documentata in modo errato nel docstring di `read_last_command_result` — corretta durante il test
+
+### 10. Submit Manuale (T11)
+
+Scenari eseguiti:
+
+- `exec_command(tab_id, "echo 'ciao dal test'", auto_submit=False)` → ritorno bloccante
+- utente preme Invio nella GUI dopo 2–3 secondi
+- verifica che `exec_command` ritorni `submitted_manually=true`
+- verifica campo `newline_ignored` nel risultato
+- chiamata a `wait_for_command_result(tab_id, after_sequence=...)` sul risultato
+
+Esito: `PASS`
+
+Note:
+
+- `exec_command` rimane in attesa bloccante fino a che l'utente non preme Invio nel terminale
+- il campo `submitted_manually=true` identifica correttamente la modalità
+- `after_sequence` è disponibile anche in modalità manuale per filtrare il risultato atteso
+- `newline_ignored=false` in modalità manuale (il newline è parte dell'azione utente)
+
+### 11. Attesa Comando Lungo con wait_for_running_command (T12)
+
+Scenari eseguiti:
+
+- utente digita e invia `sleep 4` nella GUI manualmente
+- chiamata immediata a `wait_for_running_command(tab_id)` dal tool
+- verifica che il tool attenda il completamento del sleep
+- verifica metadati nel risultato: `duration_seconds` ≥ 4
+
+Esito: `PASS`
+
+Note:
+
+- `wait_for_running_command` rileva correttamente lo stato `running` e aspetta il completamento
+- il campo `duration_seconds` riflette il tempo effettivo di esecuzione
+- questo scenario non ha `after_sequence`, dimostrando il caso d'uso corretto di `wait_for_running_command`
+
+### 12. Sessioni Delegate — in_delegated_session (T13 + S1–S7)
+
+Questa sezione copre la sub-shell avviata manualmente e il flag `in_delegated_session`.
+
+#### T13 — Primo scenario sub-shell manuale
+
+Scenari eseguiti:
+
+- `exec_command(tab_id, "bash --norc", auto_submit=False)` con utente che preme Invio → sub-shell attiva
+- verifica che lo stato del tab diventi `interactive-session`
+- iniezione di `echo test` via `exec_command(tab_id, "echo test", auto_submit=False)` nella sub-shell
+- verifica che la risposta contenga `in_delegated_session: true`
+
+Esito: `PASS`
+
+Note:
+
+- il flag `in_delegated_session` è presente nella risposta sia per `auto_submit=True` che `False`
+- l'iniezione nella sub-shell funziona a livello VTE ma non produce command events tracciati
+
+#### Campagna sub-shell S1–S7
+
+Scenari eseguiti:
+
+- **S1**: apertura tab fresca, submit manuale di `bash --norc` via GUI
+  Verifica: `running_command_status(tab_id)` riporta `state="interactive-session"` → `PASS`
+
+- **S2**: `exec_command(tab_id, "echo test_s2", auto_submit=False)` nella sub-shell attiva,
+  utente preme Invio
+  Verifica: risposta contiene `in_delegated_session: true` → `PASS`
+
+- **S3**: `read_tab(tab_id)` sulla stessa tab
+  Verifica: output VTE grezzo contiene `test_s2` → `PASS`
+
+- **S4**: `wait_for_running_command(tab_id)` → RuntimeError immediato per stato `interactive-session`
+  `wait_for_command_result(tab_id, after_sequence=..., timeout=3)` → RuntimeError per timeout
+  Verifica: entrambi falliscono come previsto; il percorso corretto è `read_tab` + `wait_for_prompt` → `PASS`
+
+- **S5**: utente digita `exit` nella sub-shell → ritorno alla shell parent MCP
+  `wait_for_prompt(tab_id, timeout=10)` → rilevamento del prompt parent
+  Verifica: `state="prompt"`, `last_line` contiene il prompt della shell MCP → `PASS`
+
+- **S6**: `exec_command(tab_id, "whoami", auto_submit=True)` dopo exit dalla sub-shell
+  Verifica: funziona normalmente, `in_delegated_session: false` → `PASS`
+  Nota: intervento manuale accidentale nella GUI ha aggiunto testo spurio, rimosso con Ctrl+C
+
+- **S7**: pulizia — `close_tab(tab_id)`
+  Verifica: tab chiusa correttamente → `PASS`
+
+Esito complessivo: `PASS`
+
+Note tecniche:
+
+- la GUI traccia le sub-shell in `TabState` con campi `delegated_session_*`
+- `rpc_exec` in `gui.py` rileva `state="interactive-session"` e imposta `pending_submit_in_delegated_session=True`
+- il flag `in_delegated_session` viene propagato in entrambi i path di `exec_command` in `server.py`
+- `wait_for_running_command` solleva immediatamente per `interactive-session` (comportamento corretto)
+- `wait_for_prompt` è il tool di osservazione raccomandato nelle sessioni delegate
+
+## Anomalie Osservate
+
+### 1. Collisione Naming Screenshot
+
+Stato: **APERTA** (workaround disponibile)
+
+Sintomo:
+
+- catture con `target="tab"` e `target="tab-container"` sullo stesso `tab_id` entro lo stesso secondo
+  producono lo stesso path di default; la seconda sovrascrive la prima silenziosamente
+
+Impatto:
+
+- un file PNG o JSON può sovrascrivere l'altro
+
+Workaround confermato:
+
+- passare `path` esplicito a entrambe le chiamate risolve completamente il problema — verificato in T7/T8
+
+Area coinvolta:
+
+- generazione path di default screenshot in `gui.py`
+
+### 2. Tracciamento Startup Command Interattivi
+
+Stato: **PARZIALMENTE RISOLTA**
+
+Sintomo originale:
+
+- `open_tab(command="bash")` produceva output ma non rendeva osservabile il prompt né i command events
+
+Aggiornamento post-T9:
+
+- **comandi terminanti** (`uname -r`, `ls`, ecc.) iniettati via `open_tab(command=...)` **sono confermati come tracciati**
+  correttamente (`sequence=1`, `exit_code=0`)
+- **sub-shell persistenti** (`bash`, `python3 REPL`, `ssh`) avviate via `open_tab(command=...)` rimangono non tracciate
+  perché la sub-shell non eredita l'hook MCP
+
+Impatto residuo:
+
+- solo il caso sub-shell persistente via `open_tab(command=...)` è ancora non tracciato
+- il caso più utile (sub-shell avviata via `exec_command` manuale) è completamente supportato
+  tramite il flag `in_delegated_session` + `read_tab` + `wait_for_prompt`
+
+### 3. Timestamps di Esecuzione da Ricontrollare
+
+Stato: **APERTA** (impatto basso in condizioni normali)
+
+Sintomo:
+
+- alcuni `started_at` e `duration_seconds` osservati sembrano partire prima del submit effettivo percepito
+
+Impatto:
+
+- i metadati temporali potrebbero risultare meno affidabili del previsto nei report di comando
+
+Area coinvolta:
+
+- hook shell che scrive transcript ed eventi
+
+## Casi Ancora da Testare Manualmente
+
+Tutti i casi elencati nella versione precedente di questo documento sono stati coperti:
+
+- `exec_command(..., auto_submit=False)` con modifica del comando → **coperto in T11**
+- `newline=True` ignorato in modalità manuale → **confermato in T11** (`newline_ignored` nel risultato)
+- sessioni delegate (sub-shell, SSH) avviate con submit manuale → **coperto in T13 + S1–S7**
+- uso combinato di `wait_for_prompt(...)` e `wait_for_running_command(...)` in sessioni interattive → **coperto in S4+S5**
+
+Nessun caso rilevante rimane da testare nelle condizioni attuali.
+
+## Stato Finale Ambiente di Test
+
+Pulizia eseguita dopo ogni campagna:
+
+- chiusura delle tab create per i test
+- verifica finale con sola tab Home presente o GUI spenta
+
+Esito cleanup: `PASS` per tutte le campagne
+
+## Conclusione
+
+La maggior parte dei tool MCP testati risulta funzionante nei casi non interattivi e nei flussi standard di osservazione del terminale. Le sessioni delegate (sub-shell, SSH) sono ora supportate tramite il flag `in_delegated_session` nell'output di `exec_command` e l'uso combinato di `read_tab` + `wait_for_prompt`.
+
+I problemi residui aperti sono due: la collisione di naming screenshot (workaround disponibile con `path` esplicito) e la piccola incertezza sui timestamp di esecuzione.
--- a/src/gnome_vte_mcp/gui.py
+++ b/src/gnome_vte_mcp/gui.py
@ -67,6 +67,7 @@ class TabState:
    delegated_session_command: str | None
    delegated_session_started_at: str | None
    delegated_session_after_sequence: int | None
+    pending_submit_in_delegated_session: bool = False


 class TerminalWindow(Gtk.ApplicationWindow):
@ -519,6 +520,7 @@ class TerminalApp(Gtk.Application):
            delegated_session_command=None,
            delegated_session_started_at=None,
            delegated_session_after_sequence=None,
+            pending_submit_in_delegated_session=False,
        )
        self.tabs[tab_id] = tab
        show_onboarding = len(self.tabs) == 1 and command is None
@ -530,7 +532,7 @@ class TerminalApp(Gtk.Application):
        self._write_shell_rc_file(tab)
        self._spawn_shell(tab, cwd=working_directory)
        if command:
-            GLib.timeout_add(150, self._feed_terminal_input, tab_id, command + "\n")
+            GLib.timeout_add(150, self._feed_terminal_input, tab_id, command, True)
        self.window.present()
        return self._serialize_tab(tab)

@ -649,11 +651,16 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
            None,
        )

-    def _feed_terminal_input(self, tab_id: str, payload: str) -> bool:
+    def _send_terminal_enter(self, terminal: Vte.Terminal) -> None:
+        terminal.feed_child(b"\r")
+
+    def _feed_terminal_input(self, tab_id: str, payload: str, submit: bool = False) -> bool:
        tab = self.tabs.get(tab_id)
        if tab is None:
            return False
        tab.terminal.paste_text(payload)
+        if submit:
+            self._send_terminal_enter(tab.terminal)
        return False

    def _looks_like_interactive_handoff(self, command: str) -> bool:
@ -723,7 +730,8 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
            raise RuntimeError(f"tab_id={tab.tab_id} has no pending submit to finalize")

        after_sequence = self._latest_command_sequence(tab)
-        delegated = self._looks_like_interactive_handoff(tab.pending_submit_text or "")
+        delegated = tab.pending_submit_in_delegated_session or self._looks_like_interactive_handoff(tab.pending_submit_text or "")
+        tab.pending_submit_in_delegated_session = False
        tab.last_manual_submit_id = tab.pending_submit_id
        tab.last_manual_submit_text = tab.pending_submit_text
        tab.last_manual_submit_at = submitted_at
@ -862,7 +870,11 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
        try:
            return self.tabs[tab_id]
        except KeyError as exc:
-            raise RuntimeError(f"Unknown tab_id: {tab_id}") from exc
+            raise RuntimeError(
+                f"Unknown tab_id: {tab_id!r}. "
+                "Call open_tab() to create a tab and get a valid tab_id, "
+                "or list_tabs() to see existing tab IDs."
+            ) from exc

    def _sanitize_text(self, text: str) -> str:
        cleaned = OSC_ESCAPE_RE.sub("", text)
@ -1201,8 +1213,7 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
            )
        if running_status["state"] == "completed":
            self._clear_current_execution(tab)
-        if running_status["state"] == "interactive-session":
-            self._clear_current_execution(tab)
+        in_delegated_session = running_status["state"] == "interactive-session"
        if tab.pending_submit_id is not None:
            raise RuntimeError(
                f"tab_id={tab_id} already has a pending manual submit request; press Enter in the GUI before writing another command"
@ -1219,8 +1230,11 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
        tab.pending_submit_id = submission_id
        tab.pending_submit_text = command
        tab.pending_submit_requested_at = requested_at
+        if in_delegated_session:
+            tab.pending_submit_in_delegated_session = True
        if auto_submit:
-            tab.terminal.paste_text(payload + "\n")
+            tab.terminal.paste_text(payload)
+            self._send_terminal_enter(tab.terminal)
            submitted_at = datetime.now().astimezone().isoformat(timespec="milliseconds")
            after_sequence = self._finalize_submission(tab, submitted_at=submitted_at)
            return {
@ -1233,6 +1247,7 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
                "submitted_at": submitted_at,
                "current_sequence": self._latest_command_sequence(tab),
                "newline_ignored": False,
+                "in_delegated_session": in_delegated_session,
            }

        tab.terminal.paste_text(payload)
@ -1244,6 +1259,7 @@ PROMPT_COMMAND='__mcp_hal9002_prompt_hook'
            "submission_id": submission_id,
            "requested_at": requested_at,
            "newline_ignored": newline,
+            "in_delegated_session": in_delegated_session,
        }

    def rpc_manual_submit_status(self, tab_id: str, submission_id: str) -> dict[str, Any]:
--- a/src/gnome_vte_mcp/server.py
+++ b/src/gnome_vte_mcp/server.py
@ -20,7 +20,23 @@ from mcp.server.fastmcp import FastMCP
 from .control import ControlError, default_socket_path, request, wait_for_socket


-mcp = FastMCP("mcp-hal9002")
+mcp = FastMCP(
+    "mcp-hal9002",
+    instructions=(
+        "This server controls a live GTK/VTE terminal on the user's desktop. "
+        "Call exec_command() to run commands directly — do not ask the user for permission first.\n\n"
+        "Standard workflow:\n"
+        "1. exec_command('command', auto_submit=True) → runs command immediately (tab is auto-created)\n"
+        "2. wait_for_command_result(tab_id, after_sequence=result['after_sequence']) → returns output, exit_code, cwd\n\n"
+        "tab_id is optional in exec_command(). When omitted, the first open tab is used automatically "
+        "(or a new tab is created). Only call open_tab() explicitly when you need a specific tab.\n\n"
+        "auto_submit=True is allowed for these read-only commands:\n"
+        "  cat date df du echo env file find free grep head hostname id ip ls\n"
+        "  netstat pgrep printenv ps pwd readlink rg ss stat tail uname uptime whoami\n\n"
+        "All other commands (bash, python3, sudo, rm, cp, ssh, …) require auto_submit=False: "
+        "the text is pasted into the terminal and the user must press Enter in the GUI to confirm."
+    ),
+)
 COMMAND_START_MARKER = "__MCP_HAL9002_CMD_START__"
 COMMAND_END_MARKER = "__MCP_HAL9002_CMD_END__"
 ANSI_ESCAPE_RE = re.compile(r"\x1b\[[0-?]*[ -/]*[@-~]")
@ -393,7 +409,12 @@ def call_gui(method: str, *, ensure_running: bool = True, **params: Any) -> Any:

@mcp.tool()
 def gui_status() -> dict[str, Any]:
-    """Return GUI lifecycle state without starting a new window."""
+    """Return GUI lifecycle state without starting a new window.
+
+    Returns ``running=true`` plus window geometry and tab count when the GUI
+    process is active, or ``running=false`` when it has not been started yet.
+    Safe to call at any time; never auto-launches the GUI.
+    """
    socket_path = _socket_path()
    if not _is_gui_running(socket_path):
        return {
@ -408,14 +429,26 @@ def gui_status() -> dict[str, Any]:

@mcp.tool()
 def open_gui() -> dict[str, Any]:
-    """Start the GUI if needed and present the shared window."""
+    """Start the GUI if not already running and present the shared window.
+
+    The GUI is a single shared instance; calling this when it is already running
+    is idempotent (brings the window to the foreground). To get a usable
+    terminal tab in one step, call ``open_tab()`` instead — it auto-launches
+    the GUI if needed.
+    """
    ensure_gui()
    return call_gui("show_window")


@mcp.tool()
 def close_gui() -> dict[str, Any]:
-    """Close the shared GUI window if it is currently running."""
+    """Close the shared GUI window and terminate the GUI process.
+
+    All open tabs are destroyed. Subsequent tool calls that require the GUI
+    will auto-relaunch it as a new process. Returns ``closed=true`` if the
+    process was running; returns ``closed=false`` without error when it was
+    already stopped.
+    """
    socket_path = _socket_path()
    if not _is_gui_running(socket_path):
        return {
@ -431,39 +464,128 @@ def close_gui() -> dict[str, Any]:

@mcp.tool()
 def open_tab(title: str | None = None, cwd: str | None = None, command: str | None = None) -> dict[str, Any]:
-    """Open a new terminal tab in the GTK/VTE prototype window."""
+    """Open a new terminal tab and return its ``tab_id``.
+
+    Parameters
+    ----------
+    title:
+        Label shown on the tab header.
+    cwd:
+        Working directory for the shell. Defaults to the user home directory.
+    command:
+        Shell command injected ~150 ms after the terminal spawns.
+        Commands injected here that **terminate on their own** (e.g.,
+        ``uname -r``, ``ls``) are tracked normally and visible via
+        ``read_last_command_result()`` and ``wait_for_command_result()``.
+        Commands that start a **persistent interactive sub-shell** (e.g.,
+        ``bash``, ``ssh``, ``python3`` REPL) are NOT tracked: the sub-shell
+        inherits neither the MCP shell hook nor its prompt markers, so
+        ``read_last_command_result()`` will raise and ``wait_for_prompt()``
+        may time out.
+        For a tracked interactive session opened with a typed command, open
+        the tab first and then call ``exec_command()`` without auto_submit.
+
+    The GUI is auto-launched if not already running.
+    """
    return call_gui("open_tab", title=title, cwd=cwd, command=command)


@mcp.tool()
 def list_tabs() -> list[dict[str, Any]]:
-    """List the currently known tabs managed by the prototype GUI."""
+    """List all currently open tabs managed by the GUI.
+
+    Returns a list of dicts, each with ``tab_id``, ``title``, ``cwd``, and
+    ``active`` (whether it is the focused tab). Use tab_ids from this list
+    with all other tab-scoped tools.
+    """
    return call_gui("list_tabs")


@mcp.tool()
 def focus_tab(tab_id: str) -> dict[str, Any]:
-    """Bring a specific tab to the foreground in the prototype GUI."""
+    """Bring a specific tab to the foreground in the GUI window.
+
+    Does not affect command execution or tracking. Useful before a screenshot
+    to ensure the desired tab is visible in ``target="window"`` captures.
+    """
    return call_gui("focus_tab", tab_id=tab_id)


@mcp.tool()
 def exec_command(
-    tab_id: str,
    command: str,
+    tab_id: str | None = None,
    newline: bool = True,
    auto_submit: bool = False,
    poll_interval: float = 0.1,
 ) -> dict[str, Any]:
-    """Write command text into an existing tab and block until the user manually presses Enter in the GUI.
-
-    By default the tool blocks indefinitely until manual submission is detected, not after command completion.
-    When `auto_submit=True`, it immediately sends Enter too, but only for small read-only commands.
-    Use wait_for_command_result() to wait for the command to finish and collect output.
+    """Inject command text into a terminal tab and optionally auto-submit it.
+
+    ``tab_id`` is optional. When omitted (or None), the first open tab is used
+    automatically. If no tab exists yet, one is created on the fly. Pass an
+    explicit ``tab_id`` (from ``open_tab()`` or ``list_tabs()``) only when you
+    need to target a specific tab among several.
+
+    Default mode (``auto_submit=False``):
+      Paste the command text and block indefinitely until the user manually
+      presses Enter in the GUI window. The ``newline`` parameter controls
+      whether a newline hint is appended to the pasted text in the input line.
+      Returns ``after_sequence`` — pass this to ``wait_for_command_result()``
+      to await the result of exactly this command.
+
+    ``auto_submit=True`` mode:
+      Immediately paste the text and send an Enter keystroke without any human
+      interaction. Allowed only for a short whitelist of read-only commands:
+      cat, date, df, du, echo, env, file, find, free, grep, head, hostname,
+      id, ip, ls, netstat, pgrep, printenv, ps, pwd, readlink, rg, ss, stat,
+      tail, uname, uptime, whoami.
+      Shell metacharacters (; && | > < ` $() are blocked. Commands such as
+      bash, python3, sudo, rm, cp, mv, ssh are also blocked.
+      The ``newline`` parameter is ignored when ``auto_submit=True``.
+
+    Typical pattern for auto_submit (no tab setup needed):
+      1. result = exec_command("pwd", auto_submit=True)
+      2. outcome = wait_for_command_result(result["tab_id"],
+                       after_sequence=result["after_sequence"])
+
+    For a command that has already been manually submitted and is still
+    running, use ``wait_for_running_command()`` instead.
+
+    Delegated interactive sessions (sub-shell, SSH, REPL):
+      ``exec_command()`` can still inject text into a delegated session —
+      the response will include ``in_delegated_session: true``. In that state
+      tracked command-completion events are NOT emitted by the sub-shell, so
+      do NOT call ``wait_for_command_result()`` or ``wait_for_running_command()``
+      afterward. Instead use ``read_tab()`` to observe output and
+      ``wait_for_prompt()`` to detect when the session returns to the parent
+      shell, or ``wait_for_command()`` for a fixed sleep.
+
+    Returns
+    -------
+    tab_id (str), written_text (str), submission_id (str), requested_at (str),
+    submitted_at (str), submitted_manually (bool), after_sequence (int — the
+    minimum sequence required by ``wait_for_command_result`` to target this
+    specific command), current_sequence (int), newline_ignored (bool),
+    in_delegated_session (bool).
+
+    Example — auto_submit two-step flow
+    ------------------------------------
+    r = exec_command("ls /tmp", auto_submit=True)
+    # r["in_delegated_session"] is False → safe to proceed
+    out = wait_for_command_result(r["tab_id"], after_sequence=r["after_sequence"])
+    # out["exit_code"] == 0, out["text"] contains directory listing
    """
    if auto_submit:
        _auto_submit_guard(command)

+    if tab_id is None:
+        tabs = call_gui("list_tabs")
+        if tabs:
+            tab_id = tabs[0]["tab_id"]
+        else:
+            new_tab = call_gui("open_tab", title=None, cwd=None, command=None)
+            tab_id = new_tab["tab_id"]
+
    result = call_gui("exec", tab_id=tab_id, command=command, newline=newline, auto_submit=auto_submit)
    if auto_submit:
        return {
@ -476,6 +598,7 @@ def exec_command(
            "after_sequence": result["after_sequence"],
            "current_sequence": result["current_sequence"],
            "newline_ignored": False,
+            "in_delegated_session": result.get("in_delegated_session", False),
        }

    submitted = _wait_for_manual_submit(
@ -493,14 +616,19 @@ def exec_command(
        "after_sequence": result["after_sequence"],
        "current_sequence": submitted["current_sequence"],
        "newline_ignored": result["newline_ignored"],
+        "in_delegated_session": result.get("in_delegated_session", False),
    }


@mcp.tool()
 def wait_for_command(delay_seconds: float) -> dict[str, Any]:
-    """Block synchronously for a fixed amount of time.
+    """Sleep for a fixed number of seconds (simple timer helper).

-    This is a simple sleep helper for workflows driven by repeated `read_tab()` calls.
+    This tool does NOT monitor any terminal event or command state — it is a
+    plain sleep. Use it only as a last-resort delay between ``read_tab()``
+    polls when the more structured ``wait_for_command_result()`` or
+    ``wait_for_running_command()`` cannot be applied (e.g., an arbitrary
+    interactive process with no tracked command events).
    """
    if delay_seconds < 0:
        raise RuntimeError("delay_seconds must be >= 0")
@ -520,13 +648,43 @@ def wait_for_command(delay_seconds: float) -> dict[str, Any]:

@mcp.tool()
 def read_tab(tab_id: str, last_n_lines: int = 200) -> dict[str, Any]:
-    """Read the trailing scrollback text from a tab."""
+    """Return the trailing scrollback text from a tab as a raw string.
+
+    Returns up to ``last_n_lines`` lines of ANSI-stripped terminal transcript.
+    The output is raw and mixes prompts, echoed input, and command output.
+
+    PREFER THIS WHEN:
+      - the tab is inside a delegated session (``in_delegated_session=true``)
+        where tracked command events are unavailable
+      - you want raw terminal state or partial output while a command runs
+    PREFER ``read_last_command_result()`` WHEN you need structured metadata:
+      exit_code, timing, cwd, or clean isolated output for a completed command.
+    """
    return call_gui("read_tab", tab_id=tab_id, last_n_lines=last_n_lines)


@mcp.tool()
 def read_last_command_result(tab_id: str) -> dict[str, Any]:
-    """Read the last completed command result for a tab, including cwd and execution timestamps."""
+    """Return the most recent tracked command result for a tab.
+
+    Returns a dict with: command, cwd, cwd_after, started_at, finished_at,
+    duration_seconds, exit_code, and text (the captured output between the
+    shell hook start and end markers, clean of prompts and echoed input).
+
+    Commands tracked here include those submitted via ``exec_command()`` and
+    terminating startup commands injected via ``open_tab(command=...)``.
+    Persistent interactive sub-shells started via ``open_tab(command="bash")``
+    are NOT tracked (they inherit no MCP shell hook).
+    Raises RuntimeError when no tracked command has completed yet for this tab.
+
+    PREFER THIS over ``read_tab()`` when you need structured metadata:
+    exit_code, timing, cwd, or clean isolated output for a completed command.
+    PREFER ``read_tab()`` when the tab is in a delegated session
+    (``in_delegated_session=true``) or when you need the raw scrollback.
+
+    To wait for a specific future command, use ``wait_for_command_result()``
+    with the ``after_sequence`` from ``exec_command()``.
+    """
    socket_path = ensure_gui()
    event = _latest_command_event(socket_path, tab_id)
    if event is None:
@ -536,9 +694,24 @@ def read_last_command_result(tab_id: str) -> dict[str, Any]:

@mcp.tool()
 def wait_for_command_result(tab_id: str, after_sequence: int | None = None, timeout: float | None = None, poll_interval: float = 0.1) -> dict[str, Any]:
-    """Block until the next completed command is observed in a tab and return its tracked result.
+    """Block until the next tracked command completes and return its result.
+
+    CHOOSE THIS TOOL when you called ``exec_command()`` and have
+    ``after_sequence`` — it targets the exact command you submitted.
+    CHOOSE ``wait_for_running_command()`` when a command was manually
+    submitted and you have no ``after_sequence``.
+
+    Pass ``after_sequence=result["after_sequence"]`` from ``exec_command()``
+    to target the specific command you just submitted. When ``after_sequence``
+    is omitted the current latest sequence is used as the baseline.

-    When `timeout` is omitted or set to a non-positive value, wait indefinitely.
+    ``timeout`` <= 0 or None means wait indefinitely. Raises RuntimeError on
+    timeout or if the tab does not exist.
+
+    Returns: tab_id (str), sequence (int), command (str), cwd (str),
+    cwd_after (str), started_at (str), finished_at (str),
+    duration_seconds (float), exit_code (int), text (str — clean command
+    output between the shell hook start and end markers).
    """
    socket_path = ensure_gui()
    baseline = after_sequence
@ -558,10 +731,24 @@ def wait_for_command_result(tab_id: str, after_sequence: int | None = None, time

@mcp.tool()
 def wait_for_running_command(tab_id: str, timeout: float | None = None, poll_interval: float = 0.1) -> dict[str, Any]:
-    """Block until the command currently executing in a tab finishes and return its tracked result.
+    """Block until the command currently executing in a tab finishes.
+
+    CHOOSE THIS TOOL when a command was manually submitted and you have no
+    ``after_sequence``.
+    CHOOSE ``wait_for_command_result()`` when you called ``exec_command()``
+    and have ``after_sequence`` — it targets the specific command precisely.
+
+    Use this when a command has already been manually submitted and the
+    terminal is still busy, but you do not have the ``after_sequence`` from
+    ``exec_command()``.
+
+    Raises RuntimeError immediately if the tab is idle and no prior running
+    command was observed in this call. Also raises if the tab is inside an
+    interactive delegated session (e.g., SSH or a nested shell) — tracked
+    command completion is unavailable in that state; use ``wait_for_prompt()``
+    instead.

-    Use this when a command has already been manually submitted and the terminal is still busy.
-    When `timeout` is omitted or set to a non-positive value, wait indefinitely.
+    ``timeout`` <= 0 or None means wait indefinitely.
    """
    socket_path = ensure_gui()
    deadline = None if timeout is None or timeout <= 0 else time.monotonic() + timeout
@ -602,10 +789,27 @@ def wait_for_prompt(
    idle_seconds: float = 0.4,
    prompt_pattern: str | None = None,
 ) -> dict[str, Any]:
-    """Block until terminal output becomes idle and the trailing line looks like a shell prompt.
-
-    This is intended for delegated interactive sessions such as SSH, where tracked command markers
-    are unavailable. When `prompt_pattern` is omitted, a conservative default prompt regex is used.
+    """Block until terminal output goes idle and the last line looks like a shell prompt.
+
+    Intended for delegated interactive sessions (e.g., SSH, remote shells)
+    where tracked command-completion events are unavailable. Detection is
+    heuristic: the transcript log must be stable for at least ``idle_seconds``
+    and the last non-empty line must match ``prompt_pattern``. The default
+    pattern matches common endings like ``$ ``, ``# ``, ``> ``, ``]# ``.
+
+    LIMITATION: if ``open_tab(command="bash")`` was used to start a sub-shell,
+    the shell hook markers may not be injected into it, and this tool may time
+    out. In that case provide an explicit ``prompt_pattern`` that matches the
+    sub-shell's prompt, or open the interactive session by manually submitting
+    via ``exec_command()`` without auto_submit.
+
+    ``timeout`` <= 0 or None means wait indefinitely.
+
+    Raises RuntimeError:
+      ``"No transcript log exists yet for tab_id: {tab_id}"`` — the tab
+        transcript has not been initialised yet (tab too new or missing).
+      ``"Timed out waiting for a visible prompt on tab_id=..."`` — the
+        timeout elapsed before a stable prompt line was detected.
    """
    socket_path = ensure_gui()
    log_path = _tab_log_path(socket_path, tab_id)
@ -642,7 +846,12 @@ def wait_for_prompt(

@mcp.tool()
 def close_tab(tab_id: str) -> dict[str, Any]:
-    """Close a tab in the prototype GUI."""
+    """Close and destroy a terminal tab by its tab_id.
+
+    The tab_id is permanently removed; any subsequent calls referencing it
+    will fail. Obtain valid tab_ids from ``list_tabs()`` or from the return
+    value of ``open_tab()``.
+    """
    return call_gui("close_tab", tab_id=tab_id)


@ -653,7 +862,23 @@ def capture_screenshot(
    path: str | None = None,
    diagnostic_overlay: bool = False,
 ) -> dict[str, Any]:
-    """Capture a PNG screenshot of the full window, the VTE content of a tab, or a tab container and return JSON-serializable metadata."""
+    """Capture a PNG screenshot and return path and metadata.
+
+    ``target`` values:
+      "window"        — the entire shared GUI window (tab_id not required)
+      "tab"           — only the VTE terminal content area of a tab
+      "tab-container" — the tab widget including its header bar
+
+    ``tab_id`` is required for target="tab" and target="tab-container".
+    ``path`` overrides the default output path inside the screenshot directory;
+    if omitted a timestamped name is generated automatically.
+    ``diagnostic_overlay`` draws a semi-transparent info panel on the image.
+
+    KNOWN LIMITATION: capturing both target="tab" and target="tab-container"
+    for the same tab_id within the same second will produce the same default
+    filename and the second capture will silently overwrite the first. Pass an
+    explicit ``path`` to each call to avoid this.
+    """
    result = call_gui(
        "capture_screenshot",
        target=target,