# Development Status

## Current Milestone

Milestone 3 — SDL Video Viewport, HUD, and Wayland Compatibility

## Current Architecture Decisions

- **Language / UI**: Python 3.9+ with PySDL2 (ctypes wrapper around system SDL2), with UI layout metrics now computed from the runtime window/display size instead of being fixed to 640x480
- **DLNA Discovery**: Custom SSDP M-SEARCH implementation using asyncio datagrams + aiohttp for device description XML
- **Content Browsing**: Direct SOAP/XML ContentDirectory client with DIDL-Lite parser (no dependency on async-upnp-client browsing at runtime — only aiohttp)
- **Playback**: integrated GStreamer backend via `PyGObject` / `GstPlayBin`, decoding video into `GstAppSink` frames that are uploaded to SDL textures and rendered in the main SDL renderer
- **Playback viewport**: SDL scales decoded video into a dedicated playback viewport in the same render pass as the HUD, with full-width video bounds and a black playback backdrop outside the viewport
- **Concurrency**: Dedicated asyncio event loop in a daemon thread; thread-safe queues bridge it to the SDL2 main loop
- **Input**: Keyboard mapping for desktop testing + SDL2 GameController for R36S D-pad/buttons
- **Font**: Bundled package font preferred first; system font fallback kept only for development hosts
- **UI icons**: prefer bundled monochrome glyphs or a bundled icon font subset instead of depending on OS emoji fonts
- **Playback HUD**: SDL-rendered overlay is now simplified and compact, uses bundled playback icons, uses a smaller dedicated playback font with title ellipsis for 640x480 readability, supports auto-hide or fixed visibility, stays visible while playback is paused, and remains in the same SDL render pass as video
- **Wayland / DRM strategy**: playback no longer depends on native overlay sinks or X11 window handles; R36S-class targets continue to prefer `kmsdrm` when no display server is present
- **Deploy packaging**: a `conda-forge`-oriented `environment.yml` now defines a reproducible Miniforge/Miniconda environment for local development and release preparation
- **Python packaging**: direct runtime dependency declarations now explicitly include `aiohttp` instead of relying on transitive installation through other packages
- **ArkOS deploy layout**: on-device installs should place Miniforge and the git checkout under `/home/ark` to avoid the full `/roms` partition, while EmulationStation integration should stay lightweight under `/roms/ports`

## Completed Tasks

- Phase 1: Project bootstrap (`pyproject.toml`, `requirements.txt`, `README.md`, package layout under `src/`)
- Phase 2: DLNA discovery (`dlna/discovery.py` — SSDP M-SEARCH, friendly-name fetch) and browsing (`dlna/client.py` — SOAP Browse, DIDL-Lite parser with relative-URL resolution + `dlna/models.py` domain models + `dlna/browser_state.py` navigation stack/cache)
- Phase 3: SDL2 UI (`ui/sdl_app.py` — window, event loop, input dispatch; `ui/screens.py` — server list, browse list, playback, error screens; `ui/theme.py` — runtime-scaled layout helpers for 640x480 and 720x720-class displays)
- Phase 4: Playback (`player/backend.py` abstract interface + `player/gstreamer_backend.py` integrated GStreamer backend)
- Phase 5: Device integration (`platform/controls.py` — keyboard + gamecontroller mapping; `platform/runtime.py` — logging, R36S heuristic, SDL env hints)
- Phase 7: Tests — 75 tests across 7 test files all passing (DIDL mapping, SOAP/XML parser, navigation state, playback backend, SDL redraw policy, input controls, runtime environment setup)
- Desktop runtime verification completed: fixed SSDP discovery socket setup for IPv4 and removed pending-task shutdown noise from the async worker thread
- Packaging hardening: bundled a local UI font asset and configured setuptools to ship it with the package
- Real LAN regression fixed: Browse SOAP parser now handles `Result` elements both with and without a namespace, matching responses from the discovered MiniDLNA/Jellyfin servers
- Playback backend pivoted to GStreamer because libmpv continued to create a separate native window on the desktop host instead of remaining embedded in the SDL UI
- Milestone 2 is now implemented with GStreamer: playback uses `GstPlayBin` plus `GstAppSink` instead of an external player, libmpv, or native overlay sinks
- SDL playback flow updated: decoded GStreamer frames are uploaded into SDL textures, playback end-of-stream returns automatically to the browser, and playback controls support pause/resume, relative seek, and volume
- Milestone 3 implemented in code: SDL scales video into a dedicated viewport inside the SDL window, with reserved HUD margins instead of using the whole window area for video
- Playback HUD expanded: progress bar, elapsed/duration, volume, buffer, resolution, and control legends are rendered around the video area and updated from GStreamer bus/pipeline queries
- Playback-page flashing root cause addressed by removing native overlay composition entirely: video and HUD are now rendered together by SDL in one pass, with redraws driven by decoded frame availability and HUD state changes
- Playback HUD simplified: the border around the video area was removed, playback control/status icons were added as bundled SVG+PNG assets, the title/timer top bar no longer overlaps, and playback now supports `auto / fixed / hidden` HUD modes through a dedicated command while staying visible when paused
- UI scaling hardened for mixed small-display targets: list rows, HUD bands, icon sizes, viewport margins, and font sizes are now derived from the actual SDL window/display size so the app remains readable on both 640x480 and 720x720 screens
- Deployment assets added: `.gitignore`, `environment.yml`, and a real `LICENSE` file so the project can be initialized and published as a clean git repository
- Conda environment refreshed for current playback needs: runtime now includes GStreamer codec/plugin packages plus explicit Python build/test tooling, while editable install keeps the package code sourced from the repo checkout
- Packaging fix: `pyproject.toml` now uses a valid TOML `[project.urls]` table so editable installs work with modern `pip` / `tomllib`
- Copilot instructions and this status file
- Device deployment reconnaissance completed on a real ArkOS-derived R36S over SSH: `/roms` is full, `/home/ark` has free space, required download tools are present, and `/roms/ports` plus `gamelist.xml` are the least invasive integration points for launchers

## Tasks In Progress

- **NV12 frame path optimization complete**: `videoscale(nearest-neighbour)→640×480` GstBin reduces Python memmove from 32 ms (77% budget) to 1 ms (2.5%) with no FPS or drop regression. Awaiting visual smoke test on device via MatHacks.sh launcher.
- Verify that the SDL-texture playback path is smooth enough on real host playback and on R36S hardware
- Device deployment on the physical R36S is now wired through ArkOS `Ports -> MatHacks`, with the heavy runtime under `/home/ark` and only a lightweight stub launcher under `/roms/ports`

## NV12 Render Path Benchmark Log

All runs performed on the physical R36S (RK3326, 4× A35 @ 1.3 GHz, 1 GB RAM) over SSH.
Stream: 1920×1080 H.264 MKV @ 24 fps via MiniDLNA over LAN. Frame budget: 41.7 ms.

| Commit | Copy / pipeline strategy | Copy mean | Copy % budget | FPS | Dropped | A/V drift |
|--------|--------------------------|-----------|---------------|-----|---------|-----------|
| `a201594` | `extract_dup` → bytes + `from_buffer_copy` → ctypes (2 copies, 6 MB/frame) | 36,499 µs | 87.6% | 24.01 | 1 | −42.8 ms |
| `da02e74` | `buffer.map(READ)` + `memmove` into reusable ctypes array (1 copy, 3.1 MB/frame) | 33,551 µs | 80.5% | 23.98 | 0 | −38.0 ms |
| `995830e` | `videoscale(nearest)→640×480` in GstBin + `memmove` (1 copy, **0.46 MB/frame**) | **1,033 µs** | **2.5%** | **23.99** | **0** | **−6.9 ms** |

**Optimization history:**

- `a201594` → `da02e74`: replaced `extract_dup + from_buffer_copy` (2 copies, 6 MB/frame) with `buffer.map(READ) + memmove` into a pre-allocated ctypes array (1 copy, 3.1 MB). Saved ~3 MB/frame allocation; copy cost reduced by 8% but still ~81% of budget.

- `da02e74` → `995830e`: identified that the 3.1 MB memmove is necessary only because the appsink receives full 1920×1080 frames, while the display is 640×480. Inserted a `GstBin` containing `videoscale(method=nearest-neighbour) → capsfilter(NV12,640×480) → appsink` as the playbin video-sink. This causes the GStreamer pipeline thread to do SW scale before Python sees the frame; Python then receives only 460 KB (6.7× smaller). Memmove drops from 32 ms to 1 ms (31× improvement, 2.5% budget). FPS and drop count are unchanged (23.99, 0). A/V drift improved from −38 ms to −7 ms.

**Alternatives tested and rejected during `995830e`:**

| Variant | Result | Root cause |
|---------|--------|-----------|
| Bilinear videoscale (no queue) | 20.92 fps, 46 drops | Bilinear reads adjacent rows → loads ~89% of source cache lines, similar cost to memmove; scheduling pressure causes drops |
| Nearest-neighbour + leaky=2 queue | 1.86 fps, 30 drops | `leaky=2` allows mppvideodec to race ahead; queue fills and drops ~93% of frames as stale |
| Nearest-neighbour, no queue | **23.99 fps, 0 drops** ✅ | Nearest reads ~44% of source cache lines; back-pressure from appsink naturally rate-limits mppvideodec |

**Key observations (`995830e`):**
- Memmove reduced from 32 ms (3.1 MB) to ~1 ms (460 KB) — 31× improvement
- No FPS or drop regression vs unscaled path
- A/V drift improved significantly (−7 ms vs −38 ms)
- SW nearest-neighbour scale on A35 costs ~14 ms per frame (estimated from cache line count), but this happens synchronously in the GStreamer pipeline thread BEFORE the appsink callback, not in the Python memmove measurement
- Remaining 97.5% of frame budget is available for SDL upload, HUD rendering, and other pipeline work

## Blockers Or Open Questions

- `SDL2_ttf` system library needed for text rendering (`sudo dnf install SDL2_ttf` on Fedora, `sudo apt install libsdl2-ttf-2.0-0` on Debian/Ubuntu). The app handles its absence gracefully but will show no text.
- Integrated playback requires system GStreamer plus Python GI bindings (for Fedora: `python3-gobject gstreamer1 gstreamer1-plugins-base gstreamer1-plugins-good`; add codec/plugin packages as needed for target media).
- Root browse verified against two real DLNA servers on the LAN.
- On-device testing on R36S hardware is pending.
- The current SDL-texture path avoids window-manager dependencies but may still need optimization on low-end hardware if BGRA upload cost is too high.
- The first Miniforge install attempt on the physical R36S failed because the downloaded installer was corrupt and crashed during extraction.
- The physical R36S now has Miniconda installed at `/home/ark/miniconda3`; the dedicated app env exists at `/home/ark/miniconda3/envs/r36s-dlna-browser`, but package solves can hang on-device and are being handled incrementally.
- The dedicated R36S conda env requires `LD_LIBRARY_PATH=/home/ark/miniconda3/envs/r36s-dlna-browser/lib` for GI and GStreamer shared libraries to resolve correctly.
- GStreamer imports now succeed in the dedicated env (`GLib`, `GObject`, `Gst`, `GstApp`, `GstVideo`), and `Application` imports cleanly.
- ArkOS menu launch works on the physical device, and DLNA browsing reaches real MiniDLNA content.
- Real playback is currently blocked by missing decoder elements in the device env: direct probing of a MiniDLNA `.mkv` URL showed missing H.264 High Profile and MPEG-4 AAC decoders, while the user-facing "can't play a text file" message is a misleading fallback caused by an additional text stream in the container.
- **RESOLVED**: `gst-libav` conda package on `linux-aarch64` has an unfixable ABI mismatch: `libavcodec.so` links `libdav1d.so.6` (from dav1d <1.3) but only dav1d 1.4.x (`.so.7`) is available, and via `libxml2-16` it also pulls `libicuuc.so.78` which is not packaged for linux-aarch64 on conda-forge. **Solution**: install system `gstreamer1.0-libav` (v1.16.1) via apt and use `GST_PLUGIN_PATH` + `LD_PRELOAD` to expose its plugins to the conda Python runtime.
- On the physical R36S, `avdec_h264` and `avdec_aac` now register and resolve when launched with:
  `LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1 GST_PLUGIN_PATH=/usr/lib/aarch64-linux-gnu/gstreamer-1.0`
  The `LD_PRELOAD` is required to avoid "cannot allocate memory in static TLS block" from the conda `libgomp.so` being loaded late by dlopen.
- These variables are now persisted in `/home/ark/miniconda3/envs/r36s-dlna-browser/etc/conda/activate.d/gst-env.sh` and explicitly set in `deploy/run.sh`.

## Next Recommended Actions

1. Run a visual playback smoke test on device directly via the app launcher (MatHacks.sh) to confirm HUD and video render correctly together under KMSDRM with the videoscale path active (nearest-neighbour 640×480 NV12).
2. Measure SDL_UpdateNVTexture upload cost for the now-smaller 640×480 texture (was 1920×1080). If it is sub-millisecond, the render path is considered optimized.
3. If visual quality from nearest-neighbour scaling is noticeably poor on-device, switch `scale.set_property("method", 1)` (bilinear) and re-benchmark; the bilinear result (20.92 fps, 46 drops) only applied to the benchmark stream — actual app playback may behave differently since the GStreamer pipeline structure is slightly different inside the real app vs the benchmark.
4. Consider profiling the SDL render loop under combined video+HUD load to confirm 30+ fps UI responsiveness alongside decoding.
5. Investigate DMA-buf import as a future zero-copy path: gst-mpp may expose DRM DMA-buf fds that SDL's KMSDRM backend can import directly via `SDL_CreateTextureFromSurface` or a custom EGL path, eliminating the CPU memmove and SW scale entirely. This is a significant engineering effort and is not needed given current performance.
6. `avdec_hevc` is still missing (HEVC decoders not in system apt `gstreamer1.0-libav 1.16.1`); `mppvideodec` covers H.264/H.265/VP8/VP9 via HW so this is less critical now.