Technology
Behind the Scenes: How FrameQuery Indexes Your Videos
A look at how FrameQuery turns raw footage into an instantly searchable index, covering transcription, object detection, face recognition, and scene understanding.
When you drop a video into FrameQuery, the goal is simple: turn an opaque media file into something you can search with plain English.
Your originals are never stored on our servers
We do not store your full-resolution file. For local files, FrameQuery generates a compressed proxy on your machine and sends only that to our processing servers. For files already on Google Drive or Dropbox, FrameQuery reads them directly to create the proxy. Either way, your originals are never stored on our infrastructure.
This keeps upload times reasonable even for large camera originals. A 40 GB RED file becomes a manageable proxy without losing the visual information needed for analysis.
What happens during processing
The proxy goes through multiple AI models in parallel:
-
Transcription: Speech-to-text with word-level timestamps. When you search for a phrase, FrameQuery points you to the exact second it was spoken. It handles multiple speakers and can distinguish between them.
-
Object detection: Identifying what appears in the frame, from people and products to vehicles and text overlays.
-
Face detection and recognition: Detecting faces and clustering them across the video so you can search by person, regardless of camera angle or lighting.
-
Scene descriptions: Generating natural-language summaries of what is happening, so you can search by describing a scene in your own words.
FrameQuery uses proprietary techniques to determine which parts of your footage need the most analysis and which can be handled more efficiently. This is how we keep processing fast and costs low without sacrificing search quality.
The result: a local search index
All of this data (transcript segments, detected objects with timestamps, face clusters, scene descriptions) gets assembled into a compact, searchable index file.
The index is downloaded to your machine. From this point on, all search is local. No network requests, no API calls, no per-search costs. The index files are compact, typically a few megabytes per hour of video, so they do not meaningfully impact your storage.
Search that understands what you mean
The index supports both exact keyword matching and semantic search. That means "person talking about launch" can match even if nobody in the video said the word "launch." You get the precision of exact search and the flexibility of natural-language queries, working together.
Try it yourself
After processing, you can search across all your indexed videos instantly. Type a query, get timestamped results, click to jump to the exact moment. It is the experience text search has had for decades, finally applied to video.
Join the waitlist to be among the first to try it.