Workflows

AI Video Search Without Uploading Your Footage to the Cloud

Most AI video search tools require uploading your originals. FrameQuery keeps your footage local, processes only lightweight proxies in the cloud, and runs all biometric analysis on-device.

FrameQuery Team13 May 20265 min read

You want to make your footage searchable. You do not want to upload 8 TB of raw cinema files to a third-party server. You definitely do not want a cloud service running face recognition on your footage and storing the biometric data.

This tension between AI-powered search and data privacy is real, and it is the reason many professionals skip AI video tools entirely. They would rather scrub manually than hand their footage to a cloud platform.

There is a better approach: process locally where privacy matters most and use cloud resources only for the computationally expensive analysis that does not require access to your original files.

Why most AI video search tools require uploads

AI video analysis is computationally expensive. Running speech-to-text, object detection, and scene description models requires GPUs that most workstations do not have (or cannot dedicate to indexing while other work is happening). Cloud processing solves the compute problem.

The simplest architecture is to upload the original video file, process it on cloud GPUs, and return the results. This is how most AI video tools work. It is straightforward for the developer and functional for the user.

But it creates several problems:

Transfer time and bandwidth. Uploading terabytes of raw footage takes days even on fast connections. For facilities without high-speed upload bandwidth, it is impractical.

Storage costs. Cloud storage for raw video is expensive. R3D, BRAW, and ProRes files are large, and keeping them on remote servers adds ongoing cost.

Data exposure. Your footage now exists on someone else's infrastructure. For client work under NDA, legal proceedings, unreleased commercial content, or any footage involving identifiable individuals, this creates liability.

Vendor lock-in. If the service shuts down, changes pricing, or has a breach, your footage and metadata are entangled with their platform.

FrameQuery's approach: proxies up, originals stay

FrameQuery separates the compute problem from the data problem. Here is how:

Lightweight proxy generation. Before cloud processing, FrameQuery generates small proxy files from your footage. These proxies contain enough visual and audio information for AI analysis but are a fraction of the size of your originals. They are not suitable for editing or delivery. They are purpose-built for indexing.

Cloud processing of proxies only. The proxies are sent to cloud GPUs for transcription, object detection, and scene description analysis. Your original R3D, BRAW, ProRes, or MXF files never leave your machine and are never stored on FrameQuery's servers.

Results returned as metadata. The cloud sends back structured data: timestamped transcripts, object labels, scene descriptions. This metadata is compact (kilobytes, not gigabytes) and contains no video content.

Local index storage. The returned metadata is stored in a local Tantivy search index on your machine. Searching happens entirely on-device with no network calls.

The result: you get GPU-accelerated AI analysis without uploading your originals.

Biometric processing stays on-device

Face and voice recognition require special handling because they produce biometric data. A face embedding is a mathematical representation of someone's face. A voice print captures the acoustic characteristics of someone's speech. Both are sensitive data categories under privacy law.

FrameQuery runs all face and voice recognition processing 100% on your local machine. No face data, no voice prints, and no biometric embeddings are sent to the cloud. The models run on your CPU and GPU, and the resulting data stays in your local search index.

When you share or export an index, biometric data is never included. Face embeddings, voice embeddings, and person labels are automatically stripped. Only non-biometric metadata (descriptions, transcripts, tags, timestamps) travels with the shared index. If a recipient has access to the underlying footage, they can run their own recognition locally and assign their own labels.

This is a deliberate architectural choice, not a performance optimization. Biometric data has legal implications that general video metadata does not.

Why this matters for specific industries

Legal and litigation. Video depositions, surveillance footage, and recorded evidence have chain-of-custody requirements. Cloud uploads create exposure of privileged content. Local processing preserves evidentiary integrity.

Corporate communications. Internal meeting recordings and HR documentation contain sensitive business information. AI search is valuable only if the content remains within the organization's infrastructure.

Documentary filmmaking. Documentary footage often includes vulnerable subjects or whistleblowers who consented to be filmed but not to have their images uploaded to a cloud platform. Local biometric processing respects those consent boundaries.

Broadcast news. Footage involving sources, unpublished investigations, and sensitive locations carries competitive and legal risks significant enough that many newsrooms avoid cloud-based AI tools entirely.

BIPA, GDPR, and biometric privacy law

Biometric privacy regulation is expanding. Illinois' Biometric Information Privacy Act (BIPA) requires informed consent before collecting biometric identifiers and imposes strict requirements on storage and sharing. GDPR classifies biometric data as a special category requiring explicit consent and data protection impact assessments.

For any organization processing video that contains identifiable faces, the question of where biometric analysis happens is not just a preference. It is a compliance consideration.

On-device biometric processing simplifies compliance. If face embeddings are generated and stored locally, they are not transmitted to or processed by a third party. The organization retains full control over the biometric data and can delete it without coordinating with a vendor.

This does not eliminate all compliance obligations. Organizations still need appropriate internal policies for handling biometric data. But it removes the third-party data sharing dimension that creates the most regulatory friction.

The trade-offs of local-first architecture

Local-first has trade-offs. Biometric processing on your machine uses your CPU and GPU and may be slower than dedicated cloud hardware. Cloud-processed modalities require an internet connection during indexing (searching is fully offline afterward). The proxy generation step adds a few minutes compared to direct cloud upload. For most users, these are minor overheads relative to the privacy benefit.

What stays local, what goes to the cloud

| Component | Location | |---|---| | Original video files | Your machine (never uploaded) | | Proxy files for processing | Temporarily sent to cloud, then deleted | | Transcription processing | Cloud | | Object detection processing | Cloud | | Scene description processing | Cloud | | Face recognition processing | Your machine | | Voice recognition processing | Your machine | | Search index | Your machine | | All search queries | Your machine (no network calls) |

Every query you run, every face you identify, and every search result you see is generated locally from your own index. The cloud is involved once, during indexing, and only with proxy data.


Your footage, your index, your machine. Join the waitlist to try privacy-first AI video search when FrameQuery launches.