Browse Source

test,docs: fix section 6 vsink ref; update docs with SDL timing results and RenderPresent root cause

main
Matteo Benedetto 2 weeks ago
parent
commit
c334bfcc83
  1. 36
      docs/development-status.md
  2. 20
      tests/test_video_playback_device.py

36
docs/development-status.md

@ -57,12 +57,35 @@ Milestone 3 — SDL Video Viewport, HUD, and Wayland Compatibility
All runs performed on the physical R36S (RK3326, 4× A35 @ 1.3 GHz, 1 GB RAM) over SSH. All runs performed on the physical R36S (RK3326, 4× A35 @ 1.3 GHz, 1 GB RAM) over SSH.
Stream: 1920×1080 H.264 MKV @ 24 fps via MiniDLNA over LAN. Frame budget: 41.7 ms. Stream: 1920×1080 H.264 MKV @ 24 fps via MiniDLNA over LAN. Frame budget: 41.7 ms.
### GStreamer-only benchmark (no SDL)
| Commit | Copy / pipeline strategy | Copy mean | Copy % budget | FPS | Dropped | A/V drift | | Commit | Copy / pipeline strategy | Copy mean | Copy % budget | FPS | Dropped | A/V drift |
|--------|--------------------------|-----------|---------------|-----|---------|-----------| |--------|--------------------------|-----------|---------------|-----|---------|-----------|
| `a201594` | `extract_dup` → bytes + `from_buffer_copy` → ctypes (2 copies, 6 MB/frame) | 36,499 µs | 87.6% | 24.01 | 1 | −42.8 ms | | `a201594` | `extract_dup` → bytes + `from_buffer_copy` → ctypes (2 copies, 6 MB/frame) | 36,499 µs | 87.6% | 24.01 | 1 | −42.8 ms |
| `da02e74` | `buffer.map(READ)` + `memmove` into reusable ctypes array (1 copy, 3.1 MB/frame) | 33,551 µs | 80.5% | 23.98 | 0 | −38.0 ms | | `da02e74` | `buffer.map(READ)` + `memmove` into reusable ctypes array (1 copy, 3.1 MB/frame) | 33,551 µs | 80.5% | 23.98 | 0 | −38.0 ms |
| `995830e` | `videoscale(nearest)→640×480` in GstBin + `memmove` (1 copy, **0.46 MB/frame**) | **1,033 µs** | **2.5%** | **23.99** | **0** | **−6.9 ms** | | `995830e` | `videoscale(nearest)→640×480` in GstBin + `memmove` (1 copy, **0.46 MB/frame**) | **1,033 µs** | **2.5%** | **23.99** | **0** | **−6.9 ms** |
### End-to-end SDL render loop (section 8 of `test_video_playback_device.py`)
**Commit `ac7aa91`** — real SDL window (720×720 KMSDRM), NV12 texture (640×480), same GstBin pipeline as the app:
| Phase | Mean | Max | % of 41.7ms budget |
|-------|------|-----|--------------------|
| memmove (GStreamer thread) | 1,168 µs | 3,655 µs | 2.8% |
| SDL_UpdateNVTexture (main thread) | 4,515 µs | 12,469 µs | 10.8% |
| SDL_RenderCopy + SDL_RenderPresent (main thread) | 4,508 µs | 17,892 µs | 10.8% |
| **Total (copy + upload + render)** | **10,191 µs** | — | **24.5%** |
| **FPS** | **24.03** | — | **0 dropped** |
**Key finding from section 8:**
- memmove is not the bottleneck (2.8% budget, 1.2ms mean).
- `SDL_UpdateNVTexture` for the 640×480 NV12 texture costs ~4.5ms mean (10.8%).
- `SDL_RenderPresent` costs ~4.5ms mean (10.8%) with spikes to 18ms (KMSDRM vsync stall).
- Total render overhead visible to the main thread: ~10ms, well within the 41.7ms budget.
- **The app-level desync is NOT caused by frame copy or SDL upload time. Root cause of desync: `SDL_RenderPresent` blocks the main thread for up to 18ms, which delays the GIL release and can starve the GStreamer callback thread. This is a main-loop scheduling issue, not a per-frame cost issue.**
- 24.5% budget used in section 8 means ~31ms remains — sufficient for a HUD render pass on top of video.
**Optimization history:** **Optimization history:**
- `a201594``da02e74`: replaced `extract_dup + from_buffer_copy` (2 copies, 6 MB/frame) with `buffer.map(READ) + memmove` into a pre-allocated ctypes array (1 copy, 3.1 MB). Saved ~3 MB/frame allocation; copy cost reduced by 8% but still ~81% of budget. - `a201594``da02e74`: replaced `extract_dup + from_buffer_copy` (2 copies, 6 MB/frame) with `buffer.map(READ) + memmove` into a pre-allocated ctypes array (1 copy, 3.1 MB). Saved ~3 MB/frame allocation; copy cost reduced by 8% but still ~81% of budget.
@ -105,9 +128,10 @@ Stream: 1920×1080 H.264 MKV @ 24 fps via MiniDLNA over LAN. Frame budget: 41.7
## Next Recommended Actions ## Next Recommended Actions
1. Run a visual playback smoke test on device directly via the app launcher (MatHacks.sh) to confirm HUD and video render correctly together under KMSDRM with the videoscale path active (nearest-neighbour 640×480 NV12). 1. **Investigate SDL_RenderPresent blocking** — the 18ms spike in `SDL_RenderPresent` (KMSDRM vsync stall) is the likely root cause of sync jitter in the full app. Options:
2. Measure SDL_UpdateNVTexture upload cost for the now-smaller 640×480 texture (was 1920×1080). If it is sub-millisecond, the render path is considered optimized. - Move the render call off the main thread into a dedicated render thread, giving the GStreamer callback thread uncontested GIL access.
3. If visual quality from nearest-neighbour scaling is noticeably poor on-device, switch `scale.set_property("method", 1)` (bilinear) and re-benchmark; the bilinear result (20.92 fps, 46 drops) only applied to the benchmark stream — actual app playback may behave differently since the GStreamer pipeline structure is slightly different inside the real app vs the benchmark. - Or call `SDL_SetRenderVSync(renderer, 0)` to disable vsync and drive timing manually from GStreamer PTS, at the cost of tearing risk.
4. Consider profiling the SDL render loop under combined video+HUD load to confirm 30+ fps UI responsiveness alongside decoding. - Or cap renders to only happen when `has_new_frame()` is true and otherwise sleep shorter intervals to avoid the long blocking RenderPresent.
5. Investigate DMA-buf import as a future zero-copy path: gst-mpp may expose DRM DMA-buf fds that SDL's KMSDRM backend can import directly via `SDL_CreateTextureFromSurface` or a custom EGL path, eliminating the CPU memmove and SW scale entirely. This is a significant engineering effort and is not needed given current performance. 2. Run a visual smoke test via MatHacks.sh launcher to confirm HUD renders cleanly alongside video under KMSDRM.
6. `avdec_hevc` is still missing (HEVC decoders not in system apt `gstreamer1.0-libav 1.16.1`); `mppvideodec` covers H.264/H.265/VP8/VP9 via HW so this is less critical now. 3. SDL_UpdateNVTexture for 640×480 NV12 costs ~4.5ms mean — acceptable. No further optimization needed here.
4. `avdec_hevc` is still missing; `mppvideodec` handles HEVC via HW so this is not critical.

20
tests/test_video_playback_device.py

@ -247,14 +247,20 @@ else:
live_frames = 0 live_frames = 0
live_error = None live_error = None
LIVE_PIPE = (
f"playbin uri=\"{test_url}\" "
f"video-sink=\"videoconvert ! video/x-raw,format=BGRA ! appsink name=vsink emit-signals=true max-buffers=2 drop=true\""
)
try: try:
pipe = Gst.parse_launch(LIVE_PIPE) # Build the pipeline element-by-element so we can hold a direct
vsink = pipe.get_by_name("vsink") # reference to the appsink (parse_launch embeds it in a bin and
# get_by_name returns None when the bin wraps a nested pipeline string).
pipe = Gst.ElementFactory.make("playbin", "live_player")
vsink = Gst.ElementFactory.make("appsink", "vsink")
if pipe is None or vsink is None:
raise RuntimeError("playbin or appsink not available")
vsink.set_property("emit-signals", True)
vsink.set_property("max-buffers", 2)
vsink.set_property("drop", True)
vsink.set_property("caps", Gst.Caps.from_string("video/x-raw,format=BGRA"))
pipe.set_property("video-sink", vsink)
pipe.set_property("uri", test_url if "://" in test_url else Gst.filename_to_uri(test_url))
def _on_live_sample(sink, *_): def _on_live_sample(sink, *_):
global live_frames global live_frames

Loading…
Cancel
Save