SDL2/GStreamer DLNA browser for R36S by Matteo Benedetto
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 

15 KiB

Development Status

Current Milestone

Milestone 3 — SDL Video Viewport, HUD, and Wayland Compatibility

Current Architecture Decisions

  • Language / UI: Python 3.9+ with PySDL2 (ctypes wrapper around system SDL2), with UI layout metrics now computed from the runtime window/display size instead of being fixed to 640x480
  • DLNA Discovery: Custom SSDP M-SEARCH implementation using asyncio datagrams + aiohttp for device description XML
  • Content Browsing: Direct SOAP/XML ContentDirectory client with DIDL-Lite parser (no dependency on async-upnp-client browsing at runtime — only aiohttp)
  • Playback: integrated GStreamer backend via PyGObject / GstPlayBin, decoding video into GstAppSink frames that are uploaded to SDL textures and rendered in the main SDL renderer
  • Playback viewport: SDL scales decoded video into a dedicated playback viewport in the same render pass as the HUD, with full-width video bounds and a black playback backdrop outside the viewport
  • Concurrency: Dedicated asyncio event loop in a daemon thread; thread-safe queues bridge it to the SDL2 main loop
  • Input: Keyboard mapping for desktop testing + SDL2 GameController for R36S D-pad/buttons
  • Font: Bundled package font preferred first; system font fallback kept only for development hosts
  • UI icons: prefer bundled monochrome glyphs or a bundled icon font subset instead of depending on OS emoji fonts
  • Playback HUD: SDL-rendered overlay is now simplified and compact, uses bundled playback icons, uses a smaller dedicated playback font with title ellipsis for 640x480 readability, supports auto-hide or fixed visibility, stays visible while playback is paused, and remains in the same SDL render pass as video
  • Wayland / DRM strategy: playback no longer depends on native overlay sinks or X11 window handles; R36S-class targets continue to prefer kmsdrm when no display server is present
  • Deploy packaging: a conda-forge-oriented environment.yml now defines a reproducible Miniforge/Miniconda environment for local development and release preparation
  • Python packaging: direct runtime dependency declarations now explicitly include aiohttp instead of relying on transitive installation through other packages
  • ArkOS deploy layout: on-device installs should place Miniforge and the git checkout under /home/ark to avoid the full /roms partition, while EmulationStation integration should stay lightweight under /roms/ports

Completed Tasks

  • Phase 1: Project bootstrap (pyproject.toml, requirements.txt, README.md, package layout under src/)
  • Phase 2: DLNA discovery (dlna/discovery.py — SSDP M-SEARCH, friendly-name fetch) and browsing (dlna/client.py — SOAP Browse, DIDL-Lite parser with relative-URL resolution + dlna/models.py domain models + dlna/browser_state.py navigation stack/cache)
  • Phase 3: SDL2 UI (ui/sdl_app.py — window, event loop, input dispatch; ui/screens.py — server list, browse list, playback, error screens; ui/theme.py — runtime-scaled layout helpers for 640x480 and 720x720-class displays)
  • Phase 4: Playback (player/backend.py abstract interface + player/gstreamer_backend.py integrated GStreamer backend)
  • Phase 5: Device integration (platform/controls.py — keyboard + gamecontroller mapping; platform/runtime.py — logging, R36S heuristic, SDL env hints)
  • Phase 7: Tests — 75 tests across 7 test files all passing (DIDL mapping, SOAP/XML parser, navigation state, playback backend, SDL redraw policy, input controls, runtime environment setup)
  • Desktop runtime verification completed: fixed SSDP discovery socket setup for IPv4 and removed pending-task shutdown noise from the async worker thread
  • Packaging hardening: bundled a local UI font asset and configured setuptools to ship it with the package
  • Real LAN regression fixed: Browse SOAP parser now handles Result elements both with and without a namespace, matching responses from the discovered MiniDLNA/Jellyfin servers
  • Playback backend pivoted to GStreamer because libmpv continued to create a separate native window on the desktop host instead of remaining embedded in the SDL UI
  • Milestone 2 is now implemented with GStreamer: playback uses GstPlayBin plus GstAppSink instead of an external player, libmpv, or native overlay sinks
  • SDL playback flow updated: decoded GStreamer frames are uploaded into SDL textures, playback end-of-stream returns automatically to the browser, and playback controls support pause/resume, relative seek, and volume
  • Milestone 3 implemented in code: SDL scales video into a dedicated viewport inside the SDL window, with reserved HUD margins instead of using the whole window area for video
  • Playback HUD expanded: progress bar, elapsed/duration, volume, buffer, resolution, and control legends are rendered around the video area and updated from GStreamer bus/pipeline queries
  • Playback-page flashing root cause addressed by removing native overlay composition entirely: video and HUD are now rendered together by SDL in one pass, with redraws driven by decoded frame availability and HUD state changes
  • Playback HUD simplified: the border around the video area was removed, playback control/status icons were added as bundled SVG+PNG assets, the title/timer top bar no longer overlaps, and playback now supports auto / fixed / hidden HUD modes through a dedicated command while staying visible when paused
  • UI scaling hardened for mixed small-display targets: list rows, HUD bands, icon sizes, viewport margins, and font sizes are now derived from the actual SDL window/display size so the app remains readable on both 640x480 and 720x720 screens
  • Deployment assets added: .gitignore, environment.yml, and a real LICENSE file so the project can be initialized and published as a clean git repository
  • Conda environment refreshed for current playback needs: runtime now includes GStreamer codec/plugin packages plus explicit Python build/test tooling, while editable install keeps the package code sourced from the repo checkout
  • Packaging fix: pyproject.toml now uses a valid TOML [project.urls] table so editable installs work with modern pip / tomllib
  • Copilot instructions and this status file
  • Device deployment reconnaissance completed on a real ArkOS-derived R36S over SSH: /roms is full, /home/ark has free space, required download tools are present, and /roms/ports plus gamelist.xml are the least invasive integration points for launchers

Tasks In Progress

  • NV12 frame path optimization complete: videoscale(nearest-neighbour)→640×480 GstBin reduces Python memmove from 32 ms (77% budget) to 1 ms (2.5%) with no FPS or drop regression. Awaiting visual smoke test on device via MatHacks.sh launcher.
  • Verify that the SDL-texture playback path is smooth enough on real host playback and on R36S hardware
  • Device deployment on the physical R36S is now wired through ArkOS Ports -> MatHacks, with the heavy runtime under /home/ark and only a lightweight stub launcher under /roms/ports

NV12 Render Path Benchmark Log

All runs performed on the physical R36S (RK3326, 4× A35 @ 1.3 GHz, 1 GB RAM) over SSH. Stream: 1920×1080 H.264 MKV @ 24 fps via MiniDLNA over LAN. Frame budget: 41.7 ms.

GStreamer-only benchmark (no SDL)

Commit Copy / pipeline strategy Copy mean Copy % budget FPS Dropped A/V drift
a201594 extract_dup → bytes + from_buffer_copy → ctypes (2 copies, 6 MB/frame) 36,499 µs 87.6% 24.01 1 −42.8 ms
da02e74 buffer.map(READ) + memmove into reusable ctypes array (1 copy, 3.1 MB/frame) 33,551 µs 80.5% 23.98 0 −38.0 ms
995830e videoscale(nearest)→640×480 in GstBin + memmove (1 copy, 0.46 MB/frame) 1,033 µs 2.5% 23.99 0 −6.9 ms

End-to-end SDL render loop (section 8 of test_video_playback_device.py)

Commit ac7aa91 — real SDL window (720×720 KMSDRM), NV12 texture (640×480), same GstBin pipeline as the app:

Phase Mean Max % of 41.7ms budget
memmove (GStreamer thread) 1,168 µs 3,655 µs 2.8%
SDL_UpdateNVTexture (main thread) 4,515 µs 12,469 µs 10.8%
SDL_RenderCopy + SDL_RenderPresent (main thread) 4,508 µs 17,892 µs 10.8%
Total (copy + upload + render) 10,191 µs 24.5%
FPS 24.03 0 dropped

Key finding from section 8:

  • memmove is not the bottleneck (2.8% budget, 1.2ms mean).
  • SDL_UpdateNVTexture for the 640×480 NV12 texture costs ~4.5ms mean (10.8%).
  • SDL_RenderPresent costs ~4.5ms mean (10.8%) with spikes to 18ms (KMSDRM vsync stall).
  • Total render overhead visible to the main thread: ~10ms, well within the 41.7ms budget.
  • The app-level desync is NOT caused by frame copy or SDL upload time. Root cause of desync: SDL_RenderPresent blocks the main thread for up to 18ms, which delays the GIL release and can starve the GStreamer callback thread. This is a main-loop scheduling issue, not a per-frame cost issue.
  • 24.5% budget used in section 8 means ~31ms remains — sufficient for a HUD render pass on top of video.

Optimization history:

  • a201594da02e74: replaced extract_dup + from_buffer_copy (2 copies, 6 MB/frame) with buffer.map(READ) + memmove into a pre-allocated ctypes array (1 copy, 3.1 MB). Saved ~3 MB/frame allocation; copy cost reduced by 8% but still ~81% of budget.

  • da02e74995830e: identified that the 3.1 MB memmove is necessary only because the appsink receives full 1920×1080 frames, while the display is 640×480. Inserted a GstBin containing videoscale(method=nearest-neighbour) → capsfilter(NV12,640×480) → appsink as the playbin video-sink. This causes the GStreamer pipeline thread to do SW scale before Python sees the frame; Python then receives only 460 KB (6.7× smaller). Memmove drops from 32 ms to 1 ms (31× improvement, 2.5% budget). FPS and drop count are unchanged (23.99, 0). A/V drift improved from −38 ms to −7 ms.

Alternatives tested and rejected during 995830e:

Variant Result Root cause
Bilinear videoscale (no queue) 20.92 fps, 46 drops Bilinear reads adjacent rows → loads ~89% of source cache lines, similar cost to memmove; scheduling pressure causes drops
Nearest-neighbour + leaky=2 queue 1.86 fps, 30 drops leaky=2 allows mppvideodec to race ahead; queue fills and drops ~93% of frames as stale
Nearest-neighbour, no queue 23.99 fps, 0 drops Nearest reads ~44% of source cache lines; back-pressure from appsink naturally rate-limits mppvideodec

Key observations (995830e):

  • Memmove reduced from 32 ms (3.1 MB) to ~1 ms (460 KB) — 31× improvement
  • No FPS or drop regression vs unscaled path
  • A/V drift improved significantly (−7 ms vs −38 ms)
  • SW nearest-neighbour scale on A35 costs ~14 ms per frame (estimated from cache line count), but this happens synchronously in the GStreamer pipeline thread BEFORE the appsink callback, not in the Python memmove measurement
  • Remaining 97.5% of frame budget is available for SDL upload, HUD rendering, and other pipeline work

Blockers Or Open Questions

  • SDL2_ttf system library needed for text rendering (sudo dnf install SDL2_ttf on Fedora, sudo apt install libsdl2-ttf-2.0-0 on Debian/Ubuntu). The app handles its absence gracefully but will show no text.
  • Integrated playback requires system GStreamer plus Python GI bindings (for Fedora: python3-gobject gstreamer1 gstreamer1-plugins-base gstreamer1-plugins-good; add codec/plugin packages as needed for target media).
  • Root browse verified against two real DLNA servers on the LAN.
  • On-device testing on R36S hardware is pending.
  • The current SDL-texture path avoids window-manager dependencies but may still need optimization on low-end hardware if BGRA upload cost is too high.
  • The first Miniforge install attempt on the physical R36S failed because the downloaded installer was corrupt and crashed during extraction.
  • The physical R36S now has Miniconda installed at /home/ark/miniconda3; the dedicated app env exists at /home/ark/miniconda3/envs/r36s-dlna-browser, but package solves can hang on-device and are being handled incrementally.
  • The dedicated R36S conda env requires LD_LIBRARY_PATH=/home/ark/miniconda3/envs/r36s-dlna-browser/lib for GI and GStreamer shared libraries to resolve correctly.
  • GStreamer imports now succeed in the dedicated env (GLib, GObject, Gst, GstApp, GstVideo), and Application imports cleanly.
  • ArkOS menu launch works on the physical device, and DLNA browsing reaches real MiniDLNA content.
  • Real playback is currently blocked by missing decoder elements in the device env: direct probing of a MiniDLNA .mkv URL showed missing H.264 High Profile and MPEG-4 AAC decoders, while the user-facing "can't play a text file" message is a misleading fallback caused by an additional text stream in the container.
  • RESOLVED: gst-libav conda package on linux-aarch64 has an unfixable ABI mismatch: libavcodec.so links libdav1d.so.6 (from dav1d <1.3) but only dav1d 1.4.x (.so.7) is available, and via libxml2-16 it also pulls libicuuc.so.78 which is not packaged for linux-aarch64 on conda-forge. Solution: install system gstreamer1.0-libav (v1.16.1) via apt and use GST_PLUGIN_PATH + LD_PRELOAD to expose its plugins to the conda Python runtime.
  • On the physical R36S, avdec_h264 and avdec_aac now register and resolve when launched with: LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1 GST_PLUGIN_PATH=/usr/lib/aarch64-linux-gnu/gstreamer-1.0 The LD_PRELOAD is required to avoid "cannot allocate memory in static TLS block" from the conda libgomp.so being loaded late by dlopen.
  • These variables are now persisted in /home/ark/miniconda3/envs/r36s-dlna-browser/etc/conda/activate.d/gst-env.sh and explicitly set in deploy/run.sh.
  1. Investigate SDL_RenderPresent blocking — the 18ms spike in SDL_RenderPresent (KMSDRM vsync stall) is the likely root cause of sync jitter in the full app. Options:
    • Move the render call off the main thread into a dedicated render thread, giving the GStreamer callback thread uncontested GIL access.
    • Or call SDL_SetRenderVSync(renderer, 0) to disable vsync and drive timing manually from GStreamer PTS, at the cost of tearing risk.
    • Or cap renders to only happen when has_new_frame() is true and otherwise sleep shorter intervals to avoid the long blocking RenderPresent.
  2. Run a visual smoke test via MatHacks.sh launcher to confirm HUD renders cleanly alongside video under KMSDRM.
  3. SDL_UpdateNVTexture for 640×480 NV12 costs ~4.5ms mean — acceptable. No further optimization needed here.
  4. avdec_hevc is still missing; mppvideodec handles HEVC via HW so this is not critical.