How It Works
Visual search and semantic transcript search run simultaneously when you type a query. Results are ranked by combined relevance so the best matches surface first.
3 results · 12ms
A001_C003.mov
VisualB_ROLL_LAKE.mp4
TranscriptINTERVIEW_EXT.mov
ObjectFind Similar
Click “Find Similar” on any scene card to search for visually similar scenes across your library. Uses cosine distance between SigLIP embeddings to rank results by visual similarity.
Source Scene
An aerial view shows a vast, dry grassland dotted with scattered trees and rounded hills under a bright, cloudy sky.
A vast golden field of dry grass with a dirt road cutting through, leading towards distant rolling hills.
Savannah landscape with scattered acacia trees stretching to the horizon under golden afternoon light.
A large herd of animals moving in a winding line across dry grassland with scattered trees.
Wide aerial shot of river winding through arid plains at dusk with long shadows.
Search Features
Search video by color to find scenes that match a specific palette. Search by scene description to locate the right mood, setting, or composition. Object detection tags every scene with what appears on screen, so you can search for “laptop” or “car” directly. Find visually similar shots by clicking any scene card. Everything runs locally and offline with no cloud dependency.
Visual search models are optional and download on demand. SigLIP handles image-text matching, MiniLM handles semantic text similarity. Both are ONNX format and run with CUDA or Metal acceleration when available, with CPU fallback. Your search index works without them (keyword search only) until you choose to enable visual search.