Add benchmark log table to development-status.md comparing:
- a201594: extract_dup+from_buffer_copy (2 copies, 6MB/frame) → 36.5ms, 87.6% budget
- da02e74: buffer.map+memmove into reusable ctypes array (1 copy, 3MB/frame) → 33.6ms, 80.5% budget
Note that the 3.1MB memmove is now the remaining bottleneck and further
reduction would require DMA-buf zero-copy via kernel VPU driver support.
Update next actions: profile SDL upload overhead, explore dmabuf fd path,
and consider 720p downscale option if stutter appears under combined load.
On linux-aarch64 the conda gst-libav package has an unfixable ABI mismatch
(libdav1d.so.6 missing, libicuuc.so.78 via libxml2-16). Fix: use system
gstreamer1.0-libav installed via apt with GST_PLUGIN_PATH, and preload
system libgomp.so.1 to avoid static TLS block errors when dlopen loads
libgstlibav.so. avdec_h264 and avdec_aac now register correctly on device.
These vars are stored in conda activate.d/gst-env.sh and in deploy/run.sh.