Behind the Scenes: How FrameQuery Indexes Your Videos
A look at the multi-pass pipeline that turns raw footage into an instantly searchable index, covering transcription, object detection, face recognition, and scene understanding.
When you drop a video into FrameQuery, the goal is simple: turn an opaque media file into something you can search with plain English. The pipeline has five stages.
Step 1: Lightweight proxy
We do not upload your full-resolution file. FrameQuery generates a compressed proxy locally and sends only that to our processing servers. Your original footage never leaves your machine.
This keeps upload times reasonable even for large camera originals. A 40 GB RED file becomes a manageable proxy without losing the visual information our models need.
Step 2: Transcription
The audio track goes through speech-to-text with word-level timestamps. When you search for a phrase, we can point you to the exact second it was spoken, not just "somewhere in this file."
We handle multiple speakers and can distinguish between them, so searching for "what did the director say about lighting" actually works.
Step 3: Visual analysis
In parallel with transcription, we sample frames at intelligent intervals based on the kind of footage and what has happened prior in the video.
Each sampled frame goes through:
- Object detection: identifying what is in the frame (people, products, vehicles, text overlays)
- Face detection and recognition: detecting faces and clustering them across the video so you can search by person
- Scene description: generating a natural-language summary of what is happening
Step 4: Index assembly
All of this data (transcript segments, detected objects with timestamps, face clusters, scene descriptions) gets assembled into a structured index file.
This index is optimized for fast local search. It uses vector embeddings for semantic matching (so "person talking about launch" matches even if nobody said the word "launch") alongside keyword matching for exact phrases.
Step 5: Local storage
The finished index is downloaded to your machine. From this point on, all search is local. No network requests, no API calls, no per-search costs.
The index files are compact, typically a few megabytes per hour of video, so they do not meaningfully impact your storage.
The result
After processing, you can search across all your indexed videos instantly. Type a query, get timestamped results, click to jump to the exact moment. It is the experience text search has had for decades, finally applied to video.
Join the waitlist to be among the first to try it.