Development Status

Current Milestone

Milestone 3 — SDL Video Viewport, HUD, and Wayland Compatibility

Current Architecture Decisions

Language / UI: Python 3.9+ with PySDL2 (ctypes wrapper around system SDL2), with UI layout metrics now computed from the runtime window/display size instead of being fixed to 640x480
DLNA Discovery: Custom SSDP M-SEARCH implementation using asyncio datagrams + aiohttp for device description XML
Content Browsing: Direct SOAP/XML ContentDirectory client with DIDL-Lite parser (no dependency on async-upnp-client browsing at runtime — only aiohttp)
Playback: integrated GStreamer backend via PyGObject / GstPlayBin, decoding video into GstAppSink frames that are uploaded to SDL textures and rendered in the main SDL renderer
Playback viewport: SDL scales decoded video into a dedicated playback viewport in the same render pass as the HUD, with full-width video bounds and a black playback backdrop outside the viewport
Concurrency: Dedicated asyncio event loop in a daemon thread; thread-safe queues bridge it to the SDL2 main loop
Input: Keyboard mapping for desktop testing + SDL2 GameController for R36S D-pad/buttons
Font: Bundled package font preferred first; system font fallback kept only for development hosts
UI icons: prefer bundled monochrome glyphs or a bundled icon font subset instead of depending on OS emoji fonts
Playback HUD: SDL-rendered overlay is now simplified and compact, uses bundled playback icons, uses a smaller dedicated playback font with title ellipsis for 640x480 readability, supports auto-hide or fixed visibility, stays visible while playback is paused, and remains in the same SDL render pass as video
Wayland / DRM strategy: playback no longer depends on native overlay sinks or X11 window handles; R36S-class targets continue to prefer kmsdrm when no display server is present
Deploy packaging: a conda-forge-oriented environment.yml now defines a reproducible Miniforge/Miniconda environment for local development and release preparation
Python packaging: direct runtime dependency declarations now explicitly include aiohttp instead of relying on transitive installation through other packages
ArkOS deploy layout: on-device installs should place Miniforge and the git checkout under /home/ark to avoid the full /roms partition, while EmulationStation integration should stay lightweight under /roms/ports

Completed Tasks

Phase 1: Project bootstrap (pyproject.toml, requirements.txt, README.md, package layout under src/)
Phase 2: DLNA discovery (dlna/discovery.py — SSDP M-SEARCH, friendly-name fetch) and browsing (dlna/client.py — SOAP Browse, DIDL-Lite parser with relative-URL resolution + dlna/models.py domain models + dlna/browser_state.py navigation stack/cache)
Phase 3: SDL2 UI (ui/sdl_app.py — window, event loop, input dispatch; ui/screens.py — server list, browse list, playback, error screens; ui/theme.py — runtime-scaled layout helpers for 640x480 and 720x720-class displays)
Phase 4: Playback (player/backend.py abstract interface + player/gstreamer_backend.py integrated GStreamer backend)
Phase 5: Device integration (platform/controls.py — keyboard + gamecontroller mapping; platform/runtime.py — logging, R36S heuristic, SDL env hints)
Phase 7: Tests — 75 tests across 7 test files all passing (DIDL mapping, SOAP/XML parser, navigation state, playback backend, SDL redraw policy, input controls, runtime environment setup)
Desktop runtime verification completed: fixed SSDP discovery socket setup for IPv4 and removed pending-task shutdown noise from the async worker thread
Packaging hardening: bundled a local UI font asset and configured setuptools to ship it with the package
Real LAN regression fixed: Browse SOAP parser now handles Result elements both with and without a namespace, matching responses from the discovered MiniDLNA/Jellyfin servers
Playback backend pivoted to GStreamer because libmpv continued to create a separate native window on the desktop host instead of remaining embedded in the SDL UI
Milestone 2 is now implemented with GStreamer: playback uses GstPlayBin plus GstAppSink instead of an external player, libmpv, or native overlay sinks
SDL playback flow updated: decoded GStreamer frames are uploaded into SDL textures, playback end-of-stream returns automatically to the browser, and playback controls support pause/resume, relative seek, and volume
Milestone 3 implemented in code: SDL scales video into a dedicated viewport inside the SDL window, with reserved HUD margins instead of using the whole window area for video
Playback HUD expanded: progress bar, elapsed/duration, volume, buffer, resolution, and control legends are rendered around the video area and updated from GStreamer bus/pipeline queries
Playback-page flashing root cause addressed by removing native overlay composition entirely: video and HUD are now rendered together by SDL in one pass, with redraws driven by decoded frame availability and HUD state changes
Playback HUD simplified: the border around the video area was removed, playback control/status icons were added as bundled SVG+PNG assets, the title/timer top bar no longer overlaps, and playback now supports auto / fixed / hidden HUD modes through a dedicated command while staying visible when paused
UI scaling hardened for mixed small-display targets: list rows, HUD bands, icon sizes, viewport margins, and font sizes are now derived from the actual SDL window/display size so the app remains readable on both 640x480 and 720x720 screens
Deployment assets added: .gitignore, environment.yml, and a real LICENSE file so the project can be initialized and published as a clean git repository
Conda environment refreshed for current playback needs: runtime now includes GStreamer codec/plugin packages plus explicit Python build/test tooling, while editable install keeps the package code sourced from the repo checkout
Packaging fix: pyproject.toml now uses a valid TOML [project.urls] table so editable installs work with modern pip / tomllib
Copilot instructions and this status file
Device deployment reconnaissance completed on a real ArkOS-derived R36S over SSH: /roms is full, /home/ark has free space, required download tools are present, and /roms/ports plus gamelist.xml are the least invasive integration points for launchers

Tasks In Progress

NV12 frame path optimization complete: videoscale(nearest-neighbour)→640×480 GstBin reduces Python memmove from 32 ms (77% budget) to 1 ms (2.5%) with no FPS or drop regression. Awaiting visual smoke test on device via MatHacks.sh launcher.
Verify that the SDL-texture playback path is smooth enough on real host playback and on R36S hardware
Device deployment on the physical R36S is now wired through ArkOS Ports -> MatHacks, with the heavy runtime under /home/ark and only a lightweight stub launcher under /roms/ports

NV12 Render Path Benchmark Log

All runs performed on the physical R36S (RK3326, 4× A35 @ 1.3 GHz, 1 GB RAM) over SSH. Stream: 1920×1080 H.264 MKV @ 24 fps via MiniDLNA over LAN. Frame budget: 41.7 ms.

GStreamer-only benchmark (no SDL)

Commit	Copy / pipeline strategy	Copy mean	Copy % budget	FPS	Dropped	A/V drift
`a201594`	`extract_dup` → bytes + `from_buffer_copy` → ctypes (2 copies, 6 MB/frame)	36,499 µs	87.6%	24.01	1	−42.8 ms
`da02e74`	`buffer.map(READ)` + `memmove` into reusable ctypes array (1 copy, 3.1 MB/frame)	33,551 µs	80.5%	23.98	0	−38.0 ms
`995830e`	`videoscale(nearest)→640×480` in GstBin + `memmove` (1 copy, 0.46 MB/frame)	1,033 µs	2.5%	23.99	0	−6.9 ms

End-to-end SDL render loop (section 8 of `test_video_playback_device.py`)

Commit ac7aa91 — real SDL window (720×720 KMSDRM), NV12 texture (640×480), same GstBin pipeline as the app:

Phase	Mean	Max	% of 41.7ms budget
memmove (GStreamer thread)	1,168 µs	3,655 µs	2.8%
SDL_UpdateNVTexture (main thread)	4,515 µs	12,469 µs	10.8%
SDL_RenderCopy + SDL_RenderPresent (main thread)	4,508 µs	17,892 µs	10.8%
Total (copy + upload + render)	10,191 µs	—	24.5%
FPS	24.03	—	0 dropped

Key finding from section 8:

memmove is not the bottleneck (2.8% budget, 1.2ms mean).
SDL_UpdateNVTexture for the 640×480 NV12 texture costs ~4.5ms mean (10.8%).
SDL_RenderPresent costs ~4.5ms mean (10.8%) with spikes to 18ms (KMSDRM vsync stall).
Total render overhead visible to the main thread: ~10ms, well within the 41.7ms budget.
The app-level desync is NOT caused by frame copy or SDL upload time. Root cause of desync: SDL_RenderPresent blocks the main thread for up to 18ms, which delays the GIL release and can starve the GStreamer callback thread. This is a main-loop scheduling issue, not a per-frame cost issue.
24.5% budget used in section 8 means ~31ms remains — sufficient for a HUD render pass on top of video.

Optimization history:

a201594 → da02e74: replaced extract_dup + from_buffer_copy (2 copies, 6 MB/frame) with buffer.map(READ) + memmove into a pre-allocated ctypes array (1 copy, 3.1 MB). Saved ~3 MB/frame allocation; copy cost reduced by 8% but still ~81% of budget.
da02e74 → 995830e: identified that the 3.1 MB memmove is necessary only because the appsink receives full 1920×1080 frames, while the display is 640×480. Inserted a GstBin containing videoscale(method=nearest-neighbour) → capsfilter(NV12,640×480) → appsink as the playbin video-sink. This causes the GStreamer pipeline thread to do SW scale before Python sees the frame; Python then receives only 460 KB (6.7× smaller). Memmove drops from 32 ms to 1 ms (31× improvement, 2.5% budget). FPS and drop count are unchanged (23.99, 0). A/V drift improved from −38 ms to −7 ms.

Alternatives tested and rejected during 995830e:

Variant	Result	Root cause
Bilinear videoscale (no queue)	20.92 fps, 46 drops	Bilinear reads adjacent rows → loads ~89% of source cache lines, similar cost to memmove; scheduling pressure causes drops
Nearest-neighbour + leaky=2 queue	1.86 fps, 30 drops	`leaky=2` allows mppvideodec to race ahead; queue fills and drops ~93% of frames as stale
Nearest-neighbour, no queue	23.99 fps, 0 drops ✅	Nearest reads ~44% of source cache lines; back-pressure from appsink naturally rate-limits mppvideodec

Key observations (995830e):

Memmove reduced from 32 ms (3.1 MB) to ~1 ms (460 KB) — 31× improvement
No FPS or drop regression vs unscaled path
A/V drift improved significantly (−7 ms vs −38 ms)
SW nearest-neighbour scale on A35 costs ~14 ms per frame (estimated from cache line count), but this happens synchronously in the GStreamer pipeline thread BEFORE the appsink callback, not in the Python memmove measurement
Remaining 97.5% of frame budget is available for SDL upload, HUD rendering, and other pipeline work

Blockers Or Open Questions

SDL2_ttf system library needed for text rendering (sudo dnf install SDL2_ttf on Fedora, sudo apt install libsdl2-ttf-2.0-0 on Debian/Ubuntu). The app handles its absence gracefully but will show no text.
Integrated playback requires system GStreamer plus Python GI bindings (for Fedora: python3-gobject gstreamer1 gstreamer1-plugins-base gstreamer1-plugins-good; add codec/plugin packages as needed for target media).
Root browse verified against two real DLNA servers on the LAN.
On-device testing on R36S hardware is pending.
The current SDL-texture path avoids window-manager dependencies but may still need optimization on low-end hardware if BGRA upload cost is too high.
The first Miniforge install attempt on the physical R36S failed because the downloaded installer was corrupt and crashed during extraction.
The physical R36S now has Miniconda installed at /home/ark/miniconda3; the dedicated app env exists at /home/ark/miniconda3/envs/r36s-dlna-browser, but package solves can hang on-device and are being handled incrementally.
The dedicated R36S conda env requires LD_LIBRARY_PATH=/home/ark/miniconda3/envs/r36s-dlna-browser/lib for GI and GStreamer shared libraries to resolve correctly.
GStreamer imports now succeed in the dedicated env (GLib, GObject, Gst, GstApp, GstVideo), and Application imports cleanly.
ArkOS menu launch works on the physical device, and DLNA browsing reaches real MiniDLNA content.
Real playback is currently blocked by missing decoder elements in the device env: direct probing of a MiniDLNA .mkv URL showed missing H.264 High Profile and MPEG-4 AAC decoders, while the user-facing "can't play a text file" message is a misleading fallback caused by an additional text stream in the container.
RESOLVED: gst-libav conda package on linux-aarch64 has an unfixable ABI mismatch: libavcodec.so links libdav1d.so.6 (from dav1d <1.3) but only dav1d 1.4.x (.so.7) is available, and via libxml2-16 it also pulls libicuuc.so.78 which is not packaged for linux-aarch64 on conda-forge. Solution: install system gstreamer1.0-libav (v1.16.1) via apt and use GST_PLUGIN_PATH + LD_PRELOAD to expose its plugins to the conda Python runtime.
On the physical R36S, avdec_h264 and avdec_aac now register and resolve when launched with: LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1 GST_PLUGIN_PATH=/usr/lib/aarch64-linux-gnu/gstreamer-1.0 The LD_PRELOAD is required to avoid "cannot allocate memory in static TLS block" from the conda libgomp.so being loaded late by dlopen.
These variables are now persisted in /home/ark/miniconda3/envs/r36s-dlna-browser/etc/conda/activate.d/gst-env.sh and explicitly set in deploy/run.sh.

Next Recommended Actions

Investigate SDL_RenderPresent blocking — the 18ms spike in SDL_RenderPresent (KMSDRM vsync stall) is the likely root cause of sync jitter in the full app. Options:
- Move the render call off the main thread into a dedicated render thread, giving the GStreamer callback thread uncontested GIL access.
- Or call SDL_SetRenderVSync(renderer, 0) to disable vsync and drive timing manually from GStreamer PTS, at the cost of tearing risk.
- Or cap renders to only happen when has_new_frame() is true and otherwise sleep shorter intervals to avoid the long blocking RenderPresent.
Run a visual smoke test via MatHacks.sh launcher to confirm HUD renders cleanly alongside video under KMSDRM.
SDL_UpdateNVTexture for 640×480 NV12 costs ~4.5ms mean — acceptable. No further optimization needed here.
avdec_hevc is still missing; mppvideodec handles HEVC via HW so this is not critical.

15 KiB Raw Blame History