- memmove is not the bottleneck (2.8% budget, 1.2ms mean).
- `SDL_UpdateNVTexture` for the 640×480 NV12 texture costs ~4.5ms mean (10.8%).
- `SDL_RenderPresent` costs ~4.5ms mean (10.8%) with spikes to 18ms (KMSDRM vsync stall).
- Total render overhead visible to the main thread: ~10ms, well within the 41.7ms budget.
- **The app-level desync is NOT caused by frame copy or SDL upload time. Root cause of desync: `SDL_RenderPresent` blocks the main thread for up to 18ms, which delays the GIL release and can starve the GStreamer callback thread. This is a main-loop scheduling issue, not a per-frame cost issue.**
- 24.5% budget used in section 8 means ~31ms remains — sufficient for a HUD render pass on top of video.
**Optimization history:**
- `a201594` → `da02e74`: replaced `extract_dup + from_buffer_copy` (2 copies, 6 MB/frame) with `buffer.map(READ) + memmove` into a pre-allocated ctypes array (1 copy, 3.1 MB). Saved ~3 MB/frame allocation; copy cost reduced by 8% but still ~81% of budget.
@ -105,9 +128,10 @@ Stream: 1920×1080 H.264 MKV @ 24 fps via MiniDLNA over LAN. Frame budget: 41.7
## Next Recommended Actions
1. Run a visual playback smoke test on device directly via the app launcher (MatHacks.sh) to confirm HUD and video render correctly together under KMSDRM with the videoscale path active (nearest-neighbour 640×480 NV12).
2. Measure SDL_UpdateNVTexture upload cost for the now-smaller 640×480 texture (was 1920×1080). If it is sub-millisecond, the render path is considered optimized.
3. If visual quality from nearest-neighbour scaling is noticeably poor on-device, switch `scale.set_property("method", 1)` (bilinear) and re-benchmark; the bilinear result (20.92 fps, 46 drops) only applied to the benchmark stream — actual app playback may behave differently since the GStreamer pipeline structure is slightly different inside the real app vs the benchmark.
4. Consider profiling the SDL render loop under combined video+HUD load to confirm 30+ fps UI responsiveness alongside decoding.
5. Investigate DMA-buf import as a future zero-copy path: gst-mpp may expose DRM DMA-buf fds that SDL's KMSDRM backend can import directly via `SDL_CreateTextureFromSurface` or a custom EGL path, eliminating the CPU memmove and SW scale entirely. This is a significant engineering effort and is not needed given current performance.
6. `avdec_hevc` is still missing (HEVC decoders not in system apt `gstreamer1.0-libav 1.16.1`); `mppvideodec` covers H.264/H.265/VP8/VP9 via HW so this is less critical now.
1. **Investigate SDL_RenderPresent blocking** — the 18ms spike in `SDL_RenderPresent` (KMSDRM vsync stall) is the likely root cause of sync jitter in the full app. Options:
- Move the render call off the main thread into a dedicated render thread, giving the GStreamer callback thread uncontested GIL access.
- Or call `SDL_SetRenderVSync(renderer, 0)` to disable vsync and drive timing manually from GStreamer PTS, at the cost of tearing risk.
- Or cap renders to only happen when `has_new_frame()` is true and otherwise sleep shorter intervals to avoid the long blocking RenderPresent.
2. Run a visual smoke test via MatHacks.sh launcher to confirm HUD renders cleanly alongside video under KMSDRM.
3. SDL_UpdateNVTexture for 640×480 NV12 costs ~4.5ms mean — acceptable. No further optimization needed here.
4. `avdec_hevc` is still missing; `mppvideodec` handles HEVC via HW so this is not critical.