# Development Status ## Current Milestone Milestone 3 — SDL Video Viewport, HUD, and Wayland Compatibility ## Current Architecture Decisions - **Language / UI**: Python 3.9+ with PySDL2 (ctypes wrapper around system SDL2), with UI layout metrics now computed from the runtime window/display size instead of being fixed to 640x480 - **DLNA Discovery**: Custom SSDP M-SEARCH implementation using asyncio datagrams + aiohttp for device description XML - **Content Browsing**: Direct SOAP/XML ContentDirectory client with DIDL-Lite parser (no dependency on async-upnp-client browsing at runtime — only aiohttp) - **Playback**: integrated GStreamer backend via `PyGObject` / `GstPlayBin`, decoding video into `GstAppSink` frames that are uploaded to SDL textures and rendered in the main SDL renderer - **Playback viewport**: SDL scales decoded video into a dedicated playback viewport in the same render pass as the HUD, with full-width video bounds and a black playback backdrop outside the viewport - **Concurrency**: Dedicated asyncio event loop in a daemon thread; thread-safe queues bridge it to the SDL2 main loop - **Input**: Keyboard mapping for desktop testing + SDL2 GameController for R36S D-pad/buttons - **Font**: Bundled package font preferred first; system font fallback kept only for development hosts - **UI icons**: prefer bundled monochrome glyphs or a bundled icon font subset instead of depending on OS emoji fonts - **Playback HUD**: SDL-rendered overlay is now simplified and compact, uses bundled playback icons, uses a smaller dedicated playback font with title ellipsis for 640x480 readability, supports auto-hide or fixed visibility, stays visible while playback is paused, and remains in the same SDL render pass as video - **Wayland / DRM strategy**: playback no longer depends on native overlay sinks or X11 window handles; R36S-class targets continue to prefer `kmsdrm` when no display server is present - **Deploy packaging**: a `conda-forge`-oriented `environment.yml` now defines a reproducible Miniforge/Miniconda environment for local development and release preparation - **Python packaging**: direct runtime dependency declarations now explicitly include `aiohttp` instead of relying on transitive installation through other packages - **ArkOS deploy layout**: on-device installs should place Miniforge and the git checkout under `/home/ark` to avoid the full `/roms` partition, while EmulationStation integration should stay lightweight under `/roms/ports` ## Completed Tasks - Phase 1: Project bootstrap (`pyproject.toml`, `requirements.txt`, `README.md`, package layout under `src/`) - Phase 2: DLNA discovery (`dlna/discovery.py` — SSDP M-SEARCH, friendly-name fetch) and browsing (`dlna/client.py` — SOAP Browse, DIDL-Lite parser with relative-URL resolution + `dlna/models.py` domain models + `dlna/browser_state.py` navigation stack/cache) - Phase 3: SDL2 UI (`ui/sdl_app.py` — window, event loop, input dispatch; `ui/screens.py` — server list, browse list, playback, error screens; `ui/theme.py` — runtime-scaled layout helpers for 640x480 and 720x720-class displays) - Phase 4: Playback (`player/backend.py` abstract interface + `player/gstreamer_backend.py` integrated GStreamer backend) - Phase 5: Device integration (`platform/controls.py` — keyboard + gamecontroller mapping; `platform/runtime.py` — logging, R36S heuristic, SDL env hints) - Phase 7: Tests — 75 tests across 7 test files all passing (DIDL mapping, SOAP/XML parser, navigation state, playback backend, SDL redraw policy, input controls, runtime environment setup) - Desktop runtime verification completed: fixed SSDP discovery socket setup for IPv4 and removed pending-task shutdown noise from the async worker thread - Packaging hardening: bundled a local UI font asset and configured setuptools to ship it with the package - Real LAN regression fixed: Browse SOAP parser now handles `Result` elements both with and without a namespace, matching responses from the discovered MiniDLNA/Jellyfin servers - Playback backend pivoted to GStreamer because libmpv continued to create a separate native window on the desktop host instead of remaining embedded in the SDL UI - Milestone 2 is now implemented with GStreamer: playback uses `GstPlayBin` plus `GstAppSink` instead of an external player, libmpv, or native overlay sinks - SDL playback flow updated: decoded GStreamer frames are uploaded into SDL textures, playback end-of-stream returns automatically to the browser, and playback controls support pause/resume, relative seek, and volume - Milestone 3 implemented in code: SDL scales video into a dedicated viewport inside the SDL window, with reserved HUD margins instead of using the whole window area for video - Playback HUD expanded: progress bar, elapsed/duration, volume, buffer, resolution, and control legends are rendered around the video area and updated from GStreamer bus/pipeline queries - Playback-page flashing root cause addressed by removing native overlay composition entirely: video and HUD are now rendered together by SDL in one pass, with redraws driven by decoded frame availability and HUD state changes - Playback HUD simplified: the border around the video area was removed, playback control/status icons were added as bundled SVG+PNG assets, the title/timer top bar no longer overlaps, and playback now supports `auto / fixed / hidden` HUD modes through a dedicated command while staying visible when paused - UI scaling hardened for mixed small-display targets: list rows, HUD bands, icon sizes, viewport margins, and font sizes are now derived from the actual SDL window/display size so the app remains readable on both 640x480 and 720x720 screens - Deployment assets added: `.gitignore`, `environment.yml`, and a real `LICENSE` file so the project can be initialized and published as a clean git repository - Conda environment refreshed for current playback needs: runtime now includes GStreamer codec/plugin packages plus explicit Python build/test tooling, while editable install keeps the package code sourced from the repo checkout - Packaging fix: `pyproject.toml` now uses a valid TOML `[project.urls]` table so editable installs work with modern `pip` / `tomllib` - Copilot instructions and this status file - Device deployment reconnaissance completed on a real ArkOS-derived R36S over SSH: `/roms` is full, `/home/ark` has free space, required download tools are present, and `/roms/ports` plus `gamelist.xml` are the least invasive integration points for launchers ## Tasks In Progress - **NV12 frame path optimization complete**: `videoscale(nearest-neighbour)→640×480` GstBin reduces Python memmove from 32 ms (77% budget) to 1 ms (2.5%) with no FPS or drop regression. Awaiting visual smoke test on device via MatHacks.sh launcher. - Verify that the SDL-texture playback path is smooth enough on real host playback and on R36S hardware - Device deployment on the physical R36S is now wired through ArkOS `Ports -> MatHacks`, with the heavy runtime under `/home/ark` and only a lightweight stub launcher under `/roms/ports` ## NV12 Render Path Benchmark Log All runs performed on the physical R36S (RK3326, 4× A35 @ 1.3 GHz, 1 GB RAM) over SSH. Stream: 1920×1080 H.264 MKV @ 24 fps via MiniDLNA over LAN. Frame budget: 41.7 ms. ### GStreamer-only benchmark (no SDL) | Commit | Copy / pipeline strategy | Copy mean | Copy % budget | FPS | Dropped | A/V drift | |--------|--------------------------|-----------|---------------|-----|---------|-----------| | `a201594` | `extract_dup` → bytes + `from_buffer_copy` → ctypes (2 copies, 6 MB/frame) | 36,499 µs | 87.6% | 24.01 | 1 | −42.8 ms | | `da02e74` | `buffer.map(READ)` + `memmove` into reusable ctypes array (1 copy, 3.1 MB/frame) | 33,551 µs | 80.5% | 23.98 | 0 | −38.0 ms | | `995830e` | `videoscale(nearest)→640×480` in GstBin + `memmove` (1 copy, **0.46 MB/frame**) | **1,033 µs** | **2.5%** | **23.99** | **0** | **−6.9 ms** | ### End-to-end SDL render loop (section 8 of `test_video_playback_device.py`) **Commit `ac7aa91`** — real SDL window (720×720 KMSDRM), NV12 texture (640×480), same GstBin pipeline as the app: | Phase | Mean | Max | % of 41.7ms budget | |-------|------|-----|--------------------| | memmove (GStreamer thread) | 1,168 µs | 3,655 µs | 2.8% | | SDL_UpdateNVTexture (main thread) | 4,515 µs | 12,469 µs | 10.8% | | SDL_RenderCopy + SDL_RenderPresent (main thread) | 4,508 µs | 17,892 µs | 10.8% | | **Total (copy + upload + render)** | **10,191 µs** | — | **24.5%** | | **FPS** | **24.03** | — | **0 dropped** | **Key finding from section 8:** - memmove is not the bottleneck (2.8% budget, 1.2ms mean). - `SDL_UpdateNVTexture` for the 640×480 NV12 texture costs ~4.5ms mean (10.8%). - `SDL_RenderPresent` costs ~4.5ms mean (10.8%) with spikes to 18ms (KMSDRM vsync stall). - Total render overhead visible to the main thread: ~10ms, well within the 41.7ms budget. - **The app-level desync is NOT caused by frame copy or SDL upload time. Root cause of desync: `SDL_RenderPresent` blocks the main thread for up to 18ms, which delays the GIL release and can starve the GStreamer callback thread. This is a main-loop scheduling issue, not a per-frame cost issue.** - 24.5% budget used in section 8 means ~31ms remains — sufficient for a HUD render pass on top of video. **Optimization history:** - `a201594` → `da02e74`: replaced `extract_dup + from_buffer_copy` (2 copies, 6 MB/frame) with `buffer.map(READ) + memmove` into a pre-allocated ctypes array (1 copy, 3.1 MB). Saved ~3 MB/frame allocation; copy cost reduced by 8% but still ~81% of budget. - `da02e74` → `995830e`: identified that the 3.1 MB memmove is necessary only because the appsink receives full 1920×1080 frames, while the display is 640×480. Inserted a `GstBin` containing `videoscale(method=nearest-neighbour) → capsfilter(NV12,640×480) → appsink` as the playbin video-sink. This causes the GStreamer pipeline thread to do SW scale before Python sees the frame; Python then receives only 460 KB (6.7× smaller). Memmove drops from 32 ms to 1 ms (31× improvement, 2.5% budget). FPS and drop count are unchanged (23.99, 0). A/V drift improved from −38 ms to −7 ms. **Alternatives tested and rejected during `995830e`:** | Variant | Result | Root cause | |---------|--------|-----------| | Bilinear videoscale (no queue) | 20.92 fps, 46 drops | Bilinear reads adjacent rows → loads ~89% of source cache lines, similar cost to memmove; scheduling pressure causes drops | | Nearest-neighbour + leaky=2 queue | 1.86 fps, 30 drops | `leaky=2` allows mppvideodec to race ahead; queue fills and drops ~93% of frames as stale | | Nearest-neighbour, no queue | **23.99 fps, 0 drops** ✅ | Nearest reads ~44% of source cache lines; back-pressure from appsink naturally rate-limits mppvideodec | **Key observations (`995830e`):** - Memmove reduced from 32 ms (3.1 MB) to ~1 ms (460 KB) — 31× improvement - No FPS or drop regression vs unscaled path - A/V drift improved significantly (−7 ms vs −38 ms) - SW nearest-neighbour scale on A35 costs ~14 ms per frame (estimated from cache line count), but this happens synchronously in the GStreamer pipeline thread BEFORE the appsink callback, not in the Python memmove measurement - Remaining 97.5% of frame budget is available for SDL upload, HUD rendering, and other pipeline work ## Blockers Or Open Questions - `SDL2_ttf` system library needed for text rendering (`sudo dnf install SDL2_ttf` on Fedora, `sudo apt install libsdl2-ttf-2.0-0` on Debian/Ubuntu). The app handles its absence gracefully but will show no text. - Integrated playback requires system GStreamer plus Python GI bindings (for Fedora: `python3-gobject gstreamer1 gstreamer1-plugins-base gstreamer1-plugins-good`; add codec/plugin packages as needed for target media). - Root browse verified against two real DLNA servers on the LAN. - On-device testing on R36S hardware is pending. - The current SDL-texture path avoids window-manager dependencies but may still need optimization on low-end hardware if BGRA upload cost is too high. - The first Miniforge install attempt on the physical R36S failed because the downloaded installer was corrupt and crashed during extraction. - The physical R36S now has Miniconda installed at `/home/ark/miniconda3`; the dedicated app env exists at `/home/ark/miniconda3/envs/r36s-dlna-browser`, but package solves can hang on-device and are being handled incrementally. - The dedicated R36S conda env requires `LD_LIBRARY_PATH=/home/ark/miniconda3/envs/r36s-dlna-browser/lib` for GI and GStreamer shared libraries to resolve correctly. - GStreamer imports now succeed in the dedicated env (`GLib`, `GObject`, `Gst`, `GstApp`, `GstVideo`), and `Application` imports cleanly. - ArkOS menu launch works on the physical device, and DLNA browsing reaches real MiniDLNA content. - Real playback is currently blocked by missing decoder elements in the device env: direct probing of a MiniDLNA `.mkv` URL showed missing H.264 High Profile and MPEG-4 AAC decoders, while the user-facing "can't play a text file" message is a misleading fallback caused by an additional text stream in the container. - **RESOLVED**: `gst-libav` conda package on `linux-aarch64` has an unfixable ABI mismatch: `libavcodec.so` links `libdav1d.so.6` (from dav1d <1.3) but only dav1d 1.4.x (`.so.7`) is available, and via `libxml2-16` it also pulls `libicuuc.so.78` which is not packaged for linux-aarch64 on conda-forge. **Solution**: install system `gstreamer1.0-libav` (v1.16.1) via apt and use `GST_PLUGIN_PATH` + `LD_PRELOAD` to expose its plugins to the conda Python runtime. - On the physical R36S, `avdec_h264` and `avdec_aac` now register and resolve when launched with: `LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1 GST_PLUGIN_PATH=/usr/lib/aarch64-linux-gnu/gstreamer-1.0` The `LD_PRELOAD` is required to avoid "cannot allocate memory in static TLS block" from the conda `libgomp.so` being loaded late by dlopen. - These variables are now persisted in `/home/ark/miniconda3/envs/r36s-dlna-browser/etc/conda/activate.d/gst-env.sh` and explicitly set in `deploy/run.sh`. ## Next Recommended Actions 1. **Investigate SDL_RenderPresent blocking** — the 18ms spike in `SDL_RenderPresent` (KMSDRM vsync stall) is the likely root cause of sync jitter in the full app. Options: - Move the render call off the main thread into a dedicated render thread, giving the GStreamer callback thread uncontested GIL access. - Or call `SDL_SetRenderVSync(renderer, 0)` to disable vsync and drive timing manually from GStreamer PTS, at the cost of tearing risk. - Or cap renders to only happen when `has_new_frame()` is true and otherwise sleep shorter intervals to avoid the long blocking RenderPresent. 2. Run a visual smoke test via MatHacks.sh launcher to confirm HUD renders cleanly alongside video under KMSDRM. 3. SDL_UpdateNVTexture for 640×480 NV12 costs ~4.5ms mean — acceptable. No further optimization needed here. 4. `avdec_hevc` is still missing; `mppvideodec` handles HEVC via HW so this is not critical.