Features · Visual Search

Describe what you need,
find the exact frame

Type what you're looking for in plain language. FrameQuery matches your description against every scene in your library using AI visual embeddings and semantic transcript analysis.

How It Works

Two AI models, one search bar

Visual search and semantic transcript search run simultaneously when you type a query. Results are ranked by combined relevance so the best matches surface first.

Visual search (SigLIP)

CLIP-based visual embeddings match your text description against scene thumbnails. Search for “sunset over water” or “person writing on whiteboard” and get frame-accurate results from scenes that look like what you described.

Semantic transcript search (MiniLM)

Word embeddings match the meaning of your query against transcript segments. “Discussing the project timeline” finds “we need to figure out when this ships” even though the words are completely different.

Find Similar

Pick a scene, find every scene like it

Click “Find Similar” on any scene card to search for visually similar scenes across your library. Uses cosine distance between SigLIP embeddings to rank results by visual similarity.

Scope toggle

Search within the current video or across your entire library.

Rich result cards

Results show thumbnails, similarity scores, and match reason pills.

Click to jump

Click any result to jump to that scene in its video.

Text-based fallback

Works without visual models too - falls back to text-based similarity (Tantivy BM25) if CLIP models aren't downloaded.

Search Features

More ways to find what you need

Color search

Filter results by dominant scene color. Click a color swatch in the search filters to find scenes with matching color palettes.

Object detection

Automatically detected objects (people, vehicles, animals, text, props) are searchable alongside visual descriptions. Search for “laptop” or “red car” to find every occurrence.

Offline and instant

Both AI models (SigLIP and MiniLM ONNX) run locally on your machine. No API calls, no per-query cost. Visual embeddings are stored in your local search index.

About the visual search models

Visual search models are optional and download on demand. SigLIP handles image-text matching, MiniLM handles semantic text similarity. Both are ONNX format and run with CUDA or Metal acceleration when available, with CPU fallback. Your search index works without them (keyword search only) until you choose to enable visual search.