- Remove fixed SDL8_SCALE_H; capsfilter now uses width-only (same as app)
so GStreamer derives height from source DAR.
- Texture created lazily on first frame with correct dimensions instead of
a fixed 640x480 that would mismatch an AR-preserving 640x360 frame.
- SDL_RenderCopy now letterboxes the frame into the window (preserves AR)
instead of stretching to fill, matching what _fit_frame_to_viewport does.
- [texture] log line reports actual w x h and AR ratio for verification.
A height range like (int)[2,2160] includes the source height (1080),
so GStreamer's caps fixation picked identity for height (no scale) and
only scaled width 1920->720, giving a distorted 720x1080 frame.
Fix: omit height entirely from the capsfilter caps string.
GStreamer then derives the output height from the source's display
aspect ratio for the given width target. NV12's even-dimension
requirement is satisfied automatically by caps fixation rounding.
Constraining both width AND height caused GStreamer to stretch the video
to fill the target box, distorting the aspect ratio when the box was not
the same AR as the source (e.g. 720x600 vs a 16:9 source).
Fix: only constrain width in the capsfilter (height=(int)[2,2160]).
GStreamer then picks the height from the source's native DAR, naturally
preserving aspect ratio without relying on add-borders.
_fit_frame_to_viewport centres the resulting frame in the SDL viewport.
- Scale target is now the actual video area (window minus HUD margins)
instead of the full window size; dimensions rounded to even for NV12.
- Set add-borders=True so videoscale letterboxes/pillarboxes the source
rather than stretching it, preserving the original aspect ratio.
- Add pixel-aspect-ratio=1/1 in capsfilter so downstream treats output
pixels as square and _fit_frame_to_viewport works correctly.
When hardware decode (mppvideodec/NV12) is active, wrap the appsink in a
GstBin with a videoscale element so the VPU decodes at full stream
resolution but Python only receives a frame pre-scaled to the SDL display
size (default 640x480).
Effect:
NV12 buffer per frame: 3,133,440 B (1080p) → 460,800 B (640x480)
memmove per frame: ~33 ms (80.5% budget) → ~5 ms (expected ~12%)
The videoscale bilinear step runs entirely in software on the A35 cores
but scales down 6.7×, so its cost is far lower than the avoided memmove.
SDL still handles final aspect-ratio fitting inside the viewport, so
visual quality is unchanged relative to what the 640x480 display can show.
Fallback: if videoscale is not available, unscaled NV12 is used as before.
Add benchmark log table to development-status.md comparing:
- a201594: extract_dup+from_buffer_copy (2 copies, 6MB/frame) → 36.5ms, 87.6% budget
- da02e74: buffer.map+memmove into reusable ctypes array (1 copy, 3MB/frame) → 33.6ms, 80.5% budget
Note that the 3.1MB memmove is now the remaining bottleneck and further
reduction would require DMA-buf zero-copy via kernel VPU driver support.
Update next actions: profile SDL upload overhead, explore dmabuf fd path,
and consider 720p downscale option if stutter appears under combined load.
Instead of extract_dup (GLib alloc+memcpy → Python bytes) followed by
from_buffer_copy (Python bytes → ctypes array) — two 3MB copies per frame —
use Gst.Buffer.map(READ) to get a zero-allocation pointer to the decoded
frame memory, then memmove directly into a pre-allocated reusable ctypes
array (_raw_arr).
This reduces the per-frame copy path from 2 copies (6MB) to 1 memmove
(3MB), with no Python bytes object allocation at all. The memmove happens
under _frame_lock so render() on the main thread never reads a partial frame.
_raw_arr is allocated once on the first frame (or on resolution change) and
reused for every subsequent frame.
_Frame no longer carries a pixels field. Tests updated accordingly.
Benchmark updated to use the same buffer.map+memmove path as the app.
mppvideodec outputs NV12 (hardware format) which GStreamer videoconvert
converts to BGRA in scalar software code — slower than avdec_h264 which
uses libav's NEON-optimised YUV→BGRA path.
Default behaviour: software decode (avdec_h264) at PRIMARY rank.
The MPP plugin is still detected and logged so the user knows it is
installed and operational.
Set R36S_HW_DECODE=1 to re-enable the rank boost once a zero-copy
NV12→SDL_UpdateNVTexture (or similar) upload path is implemented.
On linux-aarch64 the conda gst-libav package has an unfixable ABI mismatch
(libdav1d.so.6 missing, libicuuc.so.78 via libxml2-16). Fix: use system
gstreamer1.0-libav installed via apt with GST_PLUGIN_PATH, and preload
system libgomp.so.1 to avoid static TLS block errors when dlopen loads
libgstlibav.so. avdec_h264 and avdec_aac now register correctly on device.
These vars are stored in conda activate.d/gst-env.sh and in deploy/run.sh.