Native R3D and BRAW Decoding: How We Built GPU-Accelerated Cinema Camera Support

If you shoot on a RED or a Blackmagic camera, your footage lives in a proprietary RAW format that most software cannot open. RED uses .r3d files. Blackmagic uses .braw files. These are not MP4s or MOVs. They store raw sensor data that needs to be debayered (converted from a mosaic of single-colour pixels into full-colour images) before you can even see what you shot.

Most video management tools simply ignore these formats and tell you to transcode first. We decided to support them natively.

Why this matters

Professional video production runs on these cameras. A RED KOMODO or a Blackmagic Pocket Cinema Camera 6K generates terabytes of footage per shoot. If FrameQuery cannot read those files directly, we are asking editors to transcode their entire library before they can search it. That is a non-starter.

Native decode means you drop your .r3d or .braw files into FrameQuery and they just work. No pre-processing step. No intermediate files eating up disk space.

The R3D SDK

RED provides a C++ SDK for reading their proprietary format. It handles debayering and exposes metadata like resolution, frame count, duration, and audio streams.

Our integration works like this:

C wrapper layer. The R3D SDK is C++, but our backend is Rust. We wrote a C wrapper (red_wrapper.h) that exposes a clean C ABI with opaque handle types for contexts and clips. This gives Rust's bindgen something it can work with cleanly.
Rust FFI bindings. At compile time, bindgen generates Rust structs and function signatures from our C headers. The Rust decoder module (r3d_decoder.rs, roughly 1,500 lines) wraps these in safe APIs with proper error handling and resource cleanup.
GPU-accelerated debayering. Raw R3D frames need debayering, and doing it on CPU is slow. We implemented three decode paths:
- CUDA (Windows/Linux): Double-buffered pipeline. The CPU decodes raw frame data via the R3D SDK, then we copy it to GPU memory using CUDA pinned host memory (cudaMallocHost) for zero-copy DMA transfers. The GPU debayers the frame while the CPU is already decoding the next one. We wrote RAII wrappers around CUDA allocations so pinned memory gets freed automatically when the Rust buffer goes out of scope.
- Metal (macOS): Uses shared CPU/GPU memory buffers. The R3D SDK writes decoded data directly into a Metal-accessible buffer, and the GPU debayers asynchronously while the CPU moves on to the next frame. Same pipelining concept, different memory model.
- CPU fallback: Full-resolution decode to 16-bit RGB interleaved format with 16-byte alignment for SIMD-friendly cache access. Slower, but works everywhere.
Multiple resolution modes. The SDK supports decode at full, half, quarter, eighth, and sixteenth resolution. For proxy generation and thumbnailing, we do not need full 8K frames. Decoding at quarter or eighth resolution is dramatically faster and still provides more than enough detail for search indexing.

The BRAW SDK

Blackmagic takes a different approach. Their SDK is freely available with permissive licensing (RED requires a separate license agreement). The BRAW SDK uses a COM-style interface on Windows and a similar object-oriented C API on macOS and Linux.

Our integration follows the same pattern with some BRAW-specific considerations:

Pipeline auto-detection. On initialization, the BRAW SDK checks which GPU backends are available. We try CUDA first on Windows/Linux, Metal on macOS, and fall back to CPU. The SDK reports whether each pipeline is supported before we commit to it.
Smart scale selection for proxies. When generating a proxy video, we calculate the minimum decode resolution that still gives us enough pixels for the target proxy size. If the proxy target is 1280x720 and the source is 6144x3456 (BRAW 6K), eighth-resolution (768x432) is close enough. FFmpeg can upscale slightly, and we save ourselves from processing 50x more pixels than we need.
Frame buffer pooling. Instead of allocating and freeing memory for every frame, we maintain a pool of three reusable buffers. The decode loop grabs a buffer from the pool, decodes into it, sends it to the consumer, and the consumer returns it when done. This eliminates allocation overhead during streaming decode.
Resilient thumbnail generation. BRAW thumbnails use the CPU pipeline specifically (not GPU) because it is more reliable for single-frame decode. If one frame times out, we try several alternatives: the first frame, the requested frame, a few frames at 25% and 75% through the clip. After two timeouts, we give up gracefully rather than hanging the app.

The build system

Getting all of this to compile cross-platform is its own challenge. Our build.rs script handles:

Running bindgen to generate Rust FFI bindings from C headers
Compiling the C++ wrapper with cc (using Objective-C++ on macOS for Metal support)
Detecting CUDA availability and linking cudart
Linking platform-specific system libraries (COM libraries on Windows, Metal and Foundation frameworks on macOS)
Generating Windows COM headers from .idl files via MIDL

The R3D SDK headers cannot be redistributed, so builds require environment variables pointing to a local copy. The BRAW SDK headers have permissive licensing and ship with our repo.

Performance

With GPU acceleration, proxy generation from cinema RAW files is fast enough to be practical. The double-buffering pipeline means we are rarely waiting. While the GPU is processing frame N, the CPU is already decoding frame N+1. Built-in timing statistics (tracked per-frame for decode, GPU transfer, and callback stages) help us identify bottlenecks.

The decoded frames stream directly into FFmpeg for MP4 encoding, using the best available hardware encoder: NVENC on NVIDIA, QSV on Intel, AMF on AMD, with libx264 as the universal fallback.

What this enables

With native R3D and BRAW support, FrameQuery can index cinema footage the same way it indexes any other video. Drop in your camera originals, and the processing pipeline handles everything: decode the raw frames, extract transcripts from the audio track, run object detection, cluster faces, generate scene descriptions. All of that gets saved to a local index you can search instantly.

No transcoding step. No waiting hours for ProRes exports before you can even start searching.

Join the waitlist to try it with your own camera originals.