Repo Storage is in Early Beta
This feature is actively being developed and refined. While stable for testing and development, expect improvements and potential changes as we optimize performance and capabilities.
This feature is actively being developed and refined. While stable for testing and development, expect improvements and potential changes as we optimize performance and capabilities.
Early Beta & Technology
Repo Storage is completely free during our early beta.Your code is indexed and made searchable using our latest, state-of-the-art embedding and re-rank models with a simple import:
- morph-v4-embedding for code understanding
- morph-v4-rerank for top-tier code search quality
Quick Start
Use git like normal. We handle the vector database, embeddings, and infrastructure automatically.Set
index: true to enable code embedding for semantic search. Each commit is indexed separately, letting you search specific branches or historical commits. See Semantic Search for search usage.Enabling Semantic Search
By default, push does not generate embeddings. Setindex: true to enable:
- Production commits you want to search
- Feature branches ready for review
- Any code you want semantically searchable
- Work-in-progress commits
- Binary file updates
- High-frequency automated commits
Waiting for Embeddings
Indexing happens in the background. For testing or when you need immediate search results:Blocking Push
For testing/CI, you can make push wait until embeddings complete:How It Works
Indexing — When you push withindex: true:
1
Smart Chunking
Tree-sitter parses your code into semantic units (functions, classes) instead of arbitrary splits.
2
Code Embedding
Each chunk is converted to vectors using morph-v4-embed, optimized for code understanding.
3
Ready to Search
Indexed and searchable in 3-100 seconds. Content-addressable caching means identical chunks share embeddings across repos.
- Fast Retrieval — Find 50 candidates by embedding similarity (~130ms)
- Precision Reranking — Score each candidate with morph-v4-rerank (~700ms)
Why is search so accurate?
Why is search so accurate?
Two-stage retrieval: fast embedding similarity finds candidates, then cross-attention reranking scores each one precisely.
Why is it so fast?
Why is it so fast?
Content-addressable caching. Identical code chunks share embeddings across repos and commits. Most pushes only embed changed code.
What about large repos?
What about large repos?
Tree-sitter parsing extracts semantic chunks (functions, classes) instead of arbitrary splits.