- memmove is not the bottleneck (2.8% budget, 1.2ms mean).
- `SDL_UpdateNVTexture` for the 640×480 NV12 texture costs ~4.5ms mean (10.8%).
- `SDL_RenderPresent` costs ~4.5ms mean (10.8%) with spikes to 18ms (KMSDRM vsync stall).
- Total render overhead visible to the main thread: ~10ms, well within the 41.7ms budget.
- **The app-level desync is NOT caused by frame copy or SDL upload time. Root cause of desync: `SDL_RenderPresent` blocks the main thread for up to 18ms, which delays the GIL release and can starve the GStreamer callback thread. This is a main-loop scheduling issue, not a per-frame cost issue.**
- 24.5% budget used in section 8 means ~31ms remains — sufficient for a HUD render pass on top of video.
**Optimization history:**
**Optimization history:**
- `a201594` → `da02e74`: replaced `extract_dup + from_buffer_copy` (2 copies, 6 MB/frame) with `buffer.map(READ) + memmove` into a pre-allocated ctypes array (1 copy, 3.1 MB). Saved ~3 MB/frame allocation; copy cost reduced by 8% but still ~81% of budget.
- `a201594` → `da02e74`: replaced `extract_dup + from_buffer_copy` (2 copies, 6 MB/frame) with `buffer.map(READ) + memmove` into a pre-allocated ctypes array (1 copy, 3.1 MB). Saved ~3 MB/frame allocation; copy cost reduced by 8% but still ~81% of budget.
@ -105,9 +128,10 @@ Stream: 1920×1080 H.264 MKV @ 24 fps via MiniDLNA over LAN. Frame budget: 41.7
## Next Recommended Actions
## Next Recommended Actions
1. Run a visual playback smoke test on device directly via the app launcher (MatHacks.sh) to confirm HUD and video render correctly together under KMSDRM with the videoscale path active (nearest-neighbour 640×480 NV12).
1. **Investigate SDL_RenderPresent blocking** — the 18ms spike in `SDL_RenderPresent` (KMSDRM vsync stall) is the likely root cause of sync jitter in the full app. Options:
2. Measure SDL_UpdateNVTexture upload cost for the now-smaller 640×480 texture (was 1920×1080). If it is sub-millisecond, the render path is considered optimized.
- Move the render call off the main thread into a dedicated render thread, giving the GStreamer callback thread uncontested GIL access.
3. If visual quality from nearest-neighbour scaling is noticeably poor on-device, switch `scale.set_property("method", 1)` (bilinear) and re-benchmark; the bilinear result (20.92 fps, 46 drops) only applied to the benchmark stream — actual app playback may behave differently since the GStreamer pipeline structure is slightly different inside the real app vs the benchmark.
- Or call `SDL_SetRenderVSync(renderer, 0)` to disable vsync and drive timing manually from GStreamer PTS, at the cost of tearing risk.
4. Consider profiling the SDL render loop under combined video+HUD load to confirm 30+ fps UI responsiveness alongside decoding.
- Or cap renders to only happen when `has_new_frame()` is true and otherwise sleep shorter intervals to avoid the long blocking RenderPresent.
5. Investigate DMA-buf import as a future zero-copy path: gst-mpp may expose DRM DMA-buf fds that SDL's KMSDRM backend can import directly via `SDL_CreateTextureFromSurface` or a custom EGL path, eliminating the CPU memmove and SW scale entirely. This is a significant engineering effort and is not needed given current performance.
2. Run a visual smoke test via MatHacks.sh launcher to confirm HUD renders cleanly alongside video under KMSDRM.
6. `avdec_hevc` is still missing (HEVC decoders not in system apt `gstreamer1.0-libav 1.16.1`); `mppvideodec` covers H.264/H.265/VP8/VP9 via HW so this is less critical now.
3. SDL_UpdateNVTexture for 640×480 NV12 costs ~4.5ms mean — acceptable. No further optimization needed here.
4. `avdec_hevc` is still missing; `mppvideodec` handles HEVC via HW so this is not critical.