- Section 6 _on_live_sample: buf.map() returns (ok, map_info) tuple;
accessing .size on the tuple caused AttributeError for every frame
(22 tracebacks in log). Fixed to unpack properly.
- Section 8 WARMUP raised from 5 to 30 frames (~1.25 s) so cold-DRM
DMA setup, network buffer fill, and lazy texture init all complete
before stats are recorded. Eliminates the 36ms first-upload spike
from post-warmup measurements.
- _stat() now shows mean / p95 / max so isolated spikes are visible
without inflating the headline figure.
GStreamer caps fixation always picks the identity value for unconstrained
dimensions (width-only caps keeps source height unchanged, giving 640x1080
instead of 640x360 for a 1920x1080 source).
Fix: compute a 16:9 output box that fits inside the video area, use both
width and height in the capsfilter, and set add-borders=True so GStreamer
letterboxes or pillarboxes any non-16:9 source without distortion.
For the test device (720x720 KMSDRM, ~120px HUD):
video area: ~720x600 → scale target: 720x404 (16:9)
For default viewport (640x480):
video area: 640x480 → scale target: 640x360 (16:9)
Section 8 test updated to mirror the same 16:9+add-borders strategy.
- Remove fixed SDL8_SCALE_H; capsfilter now uses width-only (same as app)
so GStreamer derives height from source DAR.
- Texture created lazily on first frame with correct dimensions instead of
a fixed 640x480 that would mismatch an AR-preserving 640x360 frame.
- SDL_RenderCopy now letterboxes the frame into the window (preserves AR)
instead of stretching to fill, matching what _fit_frame_to_viewport does.
- [texture] log line reports actual w x h and AR ratio for verification.
When hardware decode (mppvideodec/NV12) is active, wrap the appsink in a
GstBin with a videoscale element so the VPU decodes at full stream
resolution but Python only receives a frame pre-scaled to the SDL display
size (default 640x480).
Effect:
NV12 buffer per frame: 3,133,440 B (1080p) → 460,800 B (640x480)
memmove per frame: ~33 ms (80.5% budget) → ~5 ms (expected ~12%)
The videoscale bilinear step runs entirely in software on the A35 cores
but scales down 6.7×, so its cost is far lower than the avoided memmove.
SDL still handles final aspect-ratio fitting inside the viewport, so
visual quality is unchanged relative to what the 640x480 display can show.
Fallback: if videoscale is not available, unscaled NV12 is used as before.
Instead of extract_dup (GLib alloc+memcpy → Python bytes) followed by
from_buffer_copy (Python bytes → ctypes array) — two 3MB copies per frame —
use Gst.Buffer.map(READ) to get a zero-allocation pointer to the decoded
frame memory, then memmove directly into a pre-allocated reusable ctypes
array (_raw_arr).
This reduces the per-frame copy path from 2 copies (6MB) to 1 memmove
(3MB), with no Python bytes object allocation at all. The memmove happens
under _frame_lock so render() on the main thread never reads a partial frame.
_raw_arr is allocated once on the first frame (or on resolution change) and
reused for every subsequent frame.
_Frame no longer carries a pixels field. Tests updated accordingly.
Benchmark updated to use the same buffer.map+memmove path as the app.