Video Search API

Search inside videos
programmatically

Build video search into your own products. FrameQuery's REST API provides multimodal video indexing and search without building the pipeline from scratch.

The Challenge

Building video search from scratch takes months

Transcription, object detection, scene analysis, face recognition, and a search engine to tie them together. Each piece is a project in itself. FrameQuery bundles the entire pipeline into a single API.

Capabilities

Four search modalities, one API

Transcription

Speech-to-text with word-level timestamps and automatic speaker diarization.

Object Detection

Identify people, vehicles, animals, props, and other objects frame by frame.

Scene Analysis

Natural language descriptions of scenes including shot type, composition, and dominant color.

Face Recognition

Detect, cluster, and identify faces across videos with on-device biometric processing.

Integration

Three steps to searchable video

Submit

Send a video to the processing endpoint. 50+ formats supported, up to 50 GB per file.

Process

Cloud processing extracts transcripts, detects objects, analyses scenes, and clusters faces. About five minutes per hour of video.

Search

Query the results programmatically. Full-text search, semantic search, and visual similarity across your processed library.

Format Support

50+ formats, no pre-processing required

Submit R3D, BRAW, ProRes, MXF, H.264, H.265, and 50+ other formats directly. No transcoding needed. See all supported formats.

R3DBRAWProResDNxHRXAVCMXFCinemaDNGH.264H.265AV1MKVMOV

Pricing

Search is free. Pay only to process.

Every plan includes the full feature set.

Free

Free

Search only

Starter

$19/mo

10 hrs processing

Pro

$54/mo

50 hrs processing

Max

$228/mo

300 hrs processing

FAQ

Common questions

What video formats does the API support?

Over 50 formats including R3D, BRAW, ARRIRAW, ProRes, ProRes RAW, DNxHR, XAVC, MXF, CinemaDNG, H.264, H.265, AV1, and more. See the full list on our compatibility page.

How fast is video processing?

Processing takes approximately five minutes per hour of video. Exact times vary by resolution, codec complexity, and queue load.

Where is video processed?

Video is processed in the cloud. Lightweight proxies are used for analysis and deleted after processing. Your original files are never stored on our servers.

Is face recognition available via the API?

Face and voice recognition run on the desktop application using on-device models. The API provides transcription, object detection, and scene analysis.

Is the API available now?

FrameQuery is currently in pre-launch. Join the waitlist to be notified when the API becomes available.

Build video search,
not the pipeline

Join the waitlist for API access.