diff --git a/docs/development-status.md b/docs/development-status.md index 755834d..efe4322 100644 --- a/docs/development-status.md +++ b/docs/development-status.md @@ -57,12 +57,35 @@ Milestone 3 — SDL Video Viewport, HUD, and Wayland Compatibility All runs performed on the physical R36S (RK3326, 4× A35 @ 1.3 GHz, 1 GB RAM) over SSH. Stream: 1920×1080 H.264 MKV @ 24 fps via MiniDLNA over LAN. Frame budget: 41.7 ms. +### GStreamer-only benchmark (no SDL) + | Commit | Copy / pipeline strategy | Copy mean | Copy % budget | FPS | Dropped | A/V drift | |--------|--------------------------|-----------|---------------|-----|---------|-----------| | `a201594` | `extract_dup` → bytes + `from_buffer_copy` → ctypes (2 copies, 6 MB/frame) | 36,499 µs | 87.6% | 24.01 | 1 | −42.8 ms | | `da02e74` | `buffer.map(READ)` + `memmove` into reusable ctypes array (1 copy, 3.1 MB/frame) | 33,551 µs | 80.5% | 23.98 | 0 | −38.0 ms | | `995830e` | `videoscale(nearest)→640×480` in GstBin + `memmove` (1 copy, **0.46 MB/frame**) | **1,033 µs** | **2.5%** | **23.99** | **0** | **−6.9 ms** | +### End-to-end SDL render loop (section 8 of `test_video_playback_device.py`) + +**Commit `ac7aa91`** — real SDL window (720×720 KMSDRM), NV12 texture (640×480), same GstBin pipeline as the app: + +| Phase | Mean | Max | % of 41.7ms budget | +|-------|------|-----|--------------------| +| memmove (GStreamer thread) | 1,168 µs | 3,655 µs | 2.8% | +| SDL_UpdateNVTexture (main thread) | 4,515 µs | 12,469 µs | 10.8% | +| SDL_RenderCopy + SDL_RenderPresent (main thread) | 4,508 µs | 17,892 µs | 10.8% | +| **Total (copy + upload + render)** | **10,191 µs** | — | **24.5%** | +| **FPS** | **24.03** | — | **0 dropped** | + +**Key finding from section 8:** +- memmove is not the bottleneck (2.8% budget, 1.2ms mean). +- `SDL_UpdateNVTexture` for the 640×480 NV12 texture costs ~4.5ms mean (10.8%). +- `SDL_RenderPresent` costs ~4.5ms mean (10.8%) with spikes to 18ms (KMSDRM vsync stall). +- Total render overhead visible to the main thread: ~10ms, well within the 41.7ms budget. +- **The app-level desync is NOT caused by frame copy or SDL upload time. Root cause of desync: `SDL_RenderPresent` blocks the main thread for up to 18ms, which delays the GIL release and can starve the GStreamer callback thread. This is a main-loop scheduling issue, not a per-frame cost issue.** +- 24.5% budget used in section 8 means ~31ms remains — sufficient for a HUD render pass on top of video. + + **Optimization history:** - `a201594` → `da02e74`: replaced `extract_dup + from_buffer_copy` (2 copies, 6 MB/frame) with `buffer.map(READ) + memmove` into a pre-allocated ctypes array (1 copy, 3.1 MB). Saved ~3 MB/frame allocation; copy cost reduced by 8% but still ~81% of budget. @@ -105,9 +128,10 @@ Stream: 1920×1080 H.264 MKV @ 24 fps via MiniDLNA over LAN. Frame budget: 41.7 ## Next Recommended Actions -1. Run a visual playback smoke test on device directly via the app launcher (MatHacks.sh) to confirm HUD and video render correctly together under KMSDRM with the videoscale path active (nearest-neighbour 640×480 NV12). -2. Measure SDL_UpdateNVTexture upload cost for the now-smaller 640×480 texture (was 1920×1080). If it is sub-millisecond, the render path is considered optimized. -3. If visual quality from nearest-neighbour scaling is noticeably poor on-device, switch `scale.set_property("method", 1)` (bilinear) and re-benchmark; the bilinear result (20.92 fps, 46 drops) only applied to the benchmark stream — actual app playback may behave differently since the GStreamer pipeline structure is slightly different inside the real app vs the benchmark. -4. Consider profiling the SDL render loop under combined video+HUD load to confirm 30+ fps UI responsiveness alongside decoding. -5. Investigate DMA-buf import as a future zero-copy path: gst-mpp may expose DRM DMA-buf fds that SDL's KMSDRM backend can import directly via `SDL_CreateTextureFromSurface` or a custom EGL path, eliminating the CPU memmove and SW scale entirely. This is a significant engineering effort and is not needed given current performance. -6. `avdec_hevc` is still missing (HEVC decoders not in system apt `gstreamer1.0-libav 1.16.1`); `mppvideodec` covers H.264/H.265/VP8/VP9 via HW so this is less critical now. \ No newline at end of file +1. **Investigate SDL_RenderPresent blocking** — the 18ms spike in `SDL_RenderPresent` (KMSDRM vsync stall) is the likely root cause of sync jitter in the full app. Options: + - Move the render call off the main thread into a dedicated render thread, giving the GStreamer callback thread uncontested GIL access. + - Or call `SDL_SetRenderVSync(renderer, 0)` to disable vsync and drive timing manually from GStreamer PTS, at the cost of tearing risk. + - Or cap renders to only happen when `has_new_frame()` is true and otherwise sleep shorter intervals to avoid the long blocking RenderPresent. +2. Run a visual smoke test via MatHacks.sh launcher to confirm HUD renders cleanly alongside video under KMSDRM. +3. SDL_UpdateNVTexture for 640×480 NV12 costs ~4.5ms mean — acceptable. No further optimization needed here. +4. `avdec_hevc` is still missing; `mppvideodec` handles HEVC via HW so this is not critical. \ No newline at end of file diff --git a/tests/test_video_playback_device.py b/tests/test_video_playback_device.py index f8d4a4b..003bc8f 100644 --- a/tests/test_video_playback_device.py +++ b/tests/test_video_playback_device.py @@ -247,14 +247,20 @@ else: live_frames = 0 live_error = None - LIVE_PIPE = ( - f"playbin uri=\"{test_url}\" " - f"video-sink=\"videoconvert ! video/x-raw,format=BGRA ! appsink name=vsink emit-signals=true max-buffers=2 drop=true\"" - ) - try: - pipe = Gst.parse_launch(LIVE_PIPE) - vsink = pipe.get_by_name("vsink") + # Build the pipeline element-by-element so we can hold a direct + # reference to the appsink (parse_launch embeds it in a bin and + # get_by_name returns None when the bin wraps a nested pipeline string). + pipe = Gst.ElementFactory.make("playbin", "live_player") + vsink = Gst.ElementFactory.make("appsink", "vsink") + if pipe is None or vsink is None: + raise RuntimeError("playbin or appsink not available") + vsink.set_property("emit-signals", True) + vsink.set_property("max-buffers", 2) + vsink.set_property("drop", True) + vsink.set_property("caps", Gst.Caps.from_string("video/x-raw,format=BGRA")) + pipe.set_property("video-sink", vsink) + pipe.set_property("uri", test_url if "://" in test_url else Gst.filename_to_uri(test_url)) def _on_live_sample(sink, *_): global live_frames