test,docs: fix section 6 vsink ref; update docs with SDL timing results and RenderPresent root cause

2 weeks ago · c334bfcc83
2 changed files with 43 additions and 13 deletions
--- a/docs/development-status.md
+++ b/docs/development-status.md
@ -57,12 +57,35 @@ Milestone 3 — SDL Video Viewport, HUD, and Wayland Compatibility
 All runs performed on the physical R36S (RK3326, 4× A35 @ 1.3 GHz, 1 GB RAM) over SSH.
 Stream: 1920×1080 H.264 MKV @ 24 fps via MiniDLNA over LAN. Frame budget: 41.7 ms.
 ### GStreamer-only benchmark (no SDL)
 | Commit | Copy / pipeline strategy | Copy mean | Copy % budget | FPS | Dropped | A/V drift |
 |--------|--------------------------|-----------|---------------|-----|---------|-----------|
 | `a201594` | `extract_dup` → bytes + `from_buffer_copy` → ctypes (2 copies, 6 MB/frame) | 36,499 µs | 87.6% | 24.01 | 1 | −42.8 ms |
 | `da02e74` | `buffer.map(READ)` + `memmove` into reusable ctypes array (1 copy, 3.1 MB/frame) | 33,551 µs | 80.5% | 23.98 | 0 | −38.0 ms |
 | `995830e` | `videoscale(nearest)→640×480` in GstBin + `memmove` (1 copy, **0.46 MB/frame**) | **1,033 µs** | **2.5%** | **23.99** | **0** | **−6.9 ms** |
 ### End-to-end SDL render loop (section 8 of `test_video_playback_device.py`)
 **Commit `ac7aa91`** — real SDL window (720×720 KMSDRM), NV12 texture (640×480), same GstBin pipeline as the app:
 | Phase | Mean | Max | % of 41.7ms budget |
 |-------|------|-----|--------------------|
 | memmove (GStreamer thread) | 1,168 µs | 3,655 µs | 2.8% |
 | SDL_UpdateNVTexture (main thread) | 4,515 µs | 12,469 µs | 10.8% |
 | SDL_RenderCopy + SDL_RenderPresent (main thread) | 4,508 µs | 17,892 µs | 10.8% |
 | **Total (copy + upload + render)** | **10,191 µs** | — | **24.5%** |
 | **FPS** | **24.03** | — | **0 dropped** |
 **Key finding from section 8:**
 - memmove is not the bottleneck (2.8% budget, 1.2ms mean).
 - `SDL_UpdateNVTexture` for the 640×480 NV12 texture costs ~4.5ms mean (10.8%).
 - `SDL_RenderPresent` costs ~4.5ms mean (10.8%) with spikes to 18ms (KMSDRM vsync stall).
 - Total render overhead visible to the main thread: ~10ms, well within the 41.7ms budget.
 - **The app-level desync is NOT caused by frame copy or SDL upload time. Root cause of desync: `SDL_RenderPresent` blocks the main thread for up to 18ms, which delays the GIL release and can starve the GStreamer callback thread. This is a main-loop scheduling issue, not a per-frame cost issue.**
 - 24.5% budget used in section 8 means ~31ms remains — sufficient for a HUD render pass on top of video.
 **Optimization history:**
 - `a201594` → `da02e74`: replaced `extract_dup + from_buffer_copy` (2 copies, 6 MB/frame) with `buffer.map(READ) + memmove` into a pre-allocated ctypes array (1 copy, 3.1 MB). Saved ~3 MB/frame allocation; copy cost reduced by 8% but still ~81% of budget.
@ -105,9 +128,10 @@ Stream: 1920×1080 H.264 MKV @ 24 fps via MiniDLNA over LAN. Frame budget: 41.7
 ## Next Recommended Actions
-1. Run a visual playback smoke test on device directly via the app launcher (MatHacks.sh) to confirm HUD and video render correctly together under KMSDRM with the videoscale path active (nearest-neighbour 640×480 NV12).
+1. **Investigate SDL_RenderPresent blocking** — the 18ms spike in `SDL_RenderPresent` (KMSDRM vsync stall) is the likely root cause of sync jitter in the full app. Options:
-2. Measure SDL_UpdateNVTexture upload cost for the now-smaller 640×480 texture (was 1920×1080). If it is sub-millisecond, the render path is considered optimized.
+   - Move the render call off the main thread into a dedicated render thread, giving the GStreamer callback thread uncontested GIL access.
-3. If visual quality from nearest-neighbour scaling is noticeably poor on-device, switch `scale.set_property("method", 1)` (bilinear) and re-benchmark; the bilinear result (20.92 fps, 46 drops) only applied to the benchmark stream — actual app playback may behave differently since the GStreamer pipeline structure is slightly different inside the real app vs the benchmark.
+   - Or call `SDL_SetRenderVSync(renderer, 0)` to disable vsync and drive timing manually from GStreamer PTS, at the cost of tearing risk.
-4. Consider profiling the SDL render loop under combined video+HUD load to confirm 30+ fps UI responsiveness alongside decoding.
+   - Or cap renders to only happen when `has_new_frame()` is true and otherwise sleep shorter intervals to avoid the long blocking RenderPresent.
-5. Investigate DMA-buf import as a future zero-copy path: gst-mpp may expose DRM DMA-buf fds that SDL's KMSDRM backend can import directly via `SDL_CreateTextureFromSurface` or a custom EGL path, eliminating the CPU memmove and SW scale entirely. This is a significant engineering effort and is not needed given current performance.
+2. Run a visual smoke test via MatHacks.sh launcher to confirm HUD renders cleanly alongside video under KMSDRM.
-6. `avdec_hevc` is still missing (HEVC decoders not in system apt `gstreamer1.0-libav 1.16.1`); `mppvideodec` covers H.264/H.265/VP8/VP9 via HW so this is less critical now.
+3. SDL_UpdateNVTexture for 640×480 NV12 costs ~4.5ms mean — acceptable. No further optimization needed here.
 4. `avdec_hevc` is still missing; `mppvideodec` handles HEVC via HW so this is not critical.
--- a/tests/test_video_playback_device.py
+++ b/tests/test_video_playback_device.py
@ -247,14 +247,20 @@ else:
    live_frames = 0
    live_error = None
    LIVE_PIPE = (
        f"playbin uri=\"{test_url}\" "
        f"video-sink=\"videoconvert ! video/x-raw,format=BGRA ! appsink name=vsink emit-signals=true max-buffers=2 drop=true\""
    )
    try:
-        pipe = Gst.parse_launch(LIVE_PIPE)
+        # Build the pipeline element-by-element so we can hold a direct
-        vsink = pipe.get_by_name("vsink")
+        # reference to the appsink (parse_launch embeds it in a bin and
        # get_by_name returns None when the bin wraps a nested pipeline string).
        pipe = Gst.ElementFactory.make("playbin", "live_player")
        vsink = Gst.ElementFactory.make("appsink", "vsink")
        if pipe is None or vsink is None:
            raise RuntimeError("playbin or appsink not available")
        vsink.set_property("emit-signals", True)
        vsink.set_property("max-buffers", 2)
        vsink.set_property("drop", True)
        vsink.set_property("caps", Gst.Caps.from_string("video/x-raw,format=BGRA"))
        pipe.set_property("video-sink", vsink)
        pipe.set_property("uri", test_url if "://" in test_url else Gst.filename_to_uri(test_url))
        def _on_live_sample(sink, *_):
            global live_frames