Instead of extract_dup (GLib alloc+memcpy → Python bytes) followed by
from_buffer_copy (Python bytes → ctypes array) — two 3MB copies per frame —
use Gst.Buffer.map(READ) to get a zero-allocation pointer to the decoded
frame memory, then memmove directly into a pre-allocated reusable ctypes
array (_raw_arr).
This reduces the per-frame copy path from 2 copies (6MB) to 1 memmove
(3MB), with no Python bytes object allocation at all. The memmove happens
under _frame_lock so render() on the main thread never reads a partial frame.
_raw_arr is allocated once on the first frame (or on resolution change) and
reused for every subsequent frame.
_Frame no longer carries a pixels field. Tests updated accordingly.
Benchmark updated to use the same buffer.map+memmove path as the app.