> ## Documentation Index
> Fetch the complete documentation index at: https://docs.morphllm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Retrieval

> Simple, effective code retrieval strategies for any repository size

<Info>
  **Prerequisite**: You'll need an account on [Morph](https://morphllm.com/dashboard) to access embeddings for larger repositories.
</Info>

## The Right Tool for the Right Size

Code retrieval should be **simple by default, sophisticated when necessary**. The approach depends entirely on your repository size:

<CardGroup cols={2}>
  <Card title="Small Repos (<200 files)" icon="file-text">
    **Agent Search**: Let Claude navigate file paths and symbols directly
  </Card>

  <Card title="Large Repos (200+ files)" icon="funnel">
    **Retrieval Funnel**: Embeddings → Reranking → Agent Reading
  </Card>
</CardGroup>

## Small Repositories: Agent Search

For repositories under 200 files, skip the complexity. Use **agent search** where you give Claude three simple tools:

### The Three Essential Tools

<Accordion title="1. list_dir: Directory Exploration">
  ```json theme={null}
  {
    "name": "list_dir",
    "description": "List directory contents to understand project structure",
    "parameters": {
      "properties": {
        "relative_workspace_path": {
          "description": "Path to list contents of, relative to workspace root",
          "type": "string"
        }
      }
    }
  }
  ```

  Let Claude explore the file structure naturally.
</Accordion>

<Accordion title="2. file_search: Find Files by Name">
  ```json theme={null}
  {
    "name": "file_search", 
    "description": "Find files by partial filename match",
    "parameters": {
      "properties": {
        "query": {
          "description": "Partial filename to search for",
          "type": "string"
        }
      }
    }
  }
  ```

  When Claude knows roughly what file it's looking for.
</Accordion>

<Accordion title="3. read_file: Read and Analyze Code">
  ```json theme={null}
  {
    "name": "read_file",
    "description": "Read file contents with optional line range",
    "parameters": {
      "properties": {
        "target_file": {
          "description": "Path to the file to read",
          "type": "string"
        },
        "start_line": {
          "description": "Optional: Start reading from this line",
          "type": "integer"
        },
        "end_line": {
          "description": "Optional: Stop reading at this line", 
          "type": "integer"
        }
      }
    }
  }
  ```

  Let Claude decide what to read based on what it discovers.
</Accordion>

<Accordion title="4. semantic_grep: Semantic Code Search (Optional)">
  ```json theme={null}
  {
    "name": "semantic_grep",
    "description": "Search for code semantically similar to a query using embeddings",
    "parameters": {
      "properties": {
        "query": {
          "description": "Natural language description of code to find",
          "type": "string"
        },
        "file_patterns": {
          "description": "Optional: File patterns to search within (e.g., '*.py', '*.ts')",
          "type": "array"
        }
      }
    }
  }
  ```

  When Claude needs to find code by meaning, not just filename.

  ```typescript theme={null}
  async function semanticGrep(query: string, filePatterns?: string[]) {
    // 1. Claude outputs semantic query: "error handling for API calls"
    // 2. Embed the query
    const queryEmbedding = await embed(query);
    
    // 3. Compare against pre-embedded file chunks
    const matches = await findSimilar(queryEmbedding, {
      patterns: filePatterns,
      threshold: 0.7,
      limit: 10
    });
    
    return matches; // Claude gets semantic matches to explore
  }
  ```
</Accordion>

**Benefits:**

* Zero setup time
* No indexing required
* Claude makes intelligent choices about what to read
* Works perfectly for most development scenarios

## Large Repositories: The Retrieval Funnel

When you hit **200+ files**, raw agent search becomes inefficient. Use the **retrieval funnel**:

<Steps>
  <Step title="🔍 Cast Wide Net">
    **Embeddings**: Find 50-100 potentially relevant code chunks using Morph Embeddings
  </Step>

  <Step title="⚡ Focus Down">
    **Reranking**: Narrow to the 5-10 most relevant pieces using Morph Rerank
  </Step>

  <Step title="🧠 Agent Reading">
    **Claude**: Inspect the ranked results and decide what to read in full
  </Step>
</Steps>

### Implementation: The Funnel Approach

```typescript theme={null}
import { OpenAI } from 'openai';

const openai = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: 'https://api.morphllm.com/v1'
});

async function retrievalFunnel(query: string) {
  // Step 1: Cast wide net with embeddings
  const embedding = await openai.embeddings.create({
    model: "morph-embedding-v3",
    input: query
  });
  
  const wideResults = await vectorSearch(embedding, { limit: 50 });
  
  // Step 2: Focus with reranking  
  const reranked = await openai.completions.create({
    model: "morph-rerank-v3",
    documents: wideResults.map(r => r.content),
    query: query,
    top_k: 8
  });
  
  // Step 3: Let Claude decide what to read
  return provideCandidatesToClaude(reranked.data);
}
```

### When to Use Each Approach

<Card title="Decision Tree" icon="decision-tree">
  ```
  Repository size?
  ├── < 200 files → Agent Search
  │   ├── Basic: list_dir + file_search + read_file
  │   └── +semantic_grep for meaning-based search (optional)
  └── 200+ files → Retrieval Funnel
      ├── < 1000 files → Basic embeddings + rerank
      └── 1000+ files → Add AST parsing + hybrid search
  ```
</Card>

## Best Practices

<AccordionGroup>
  <Accordion title="Start Simple">
    * Begin with agent search for any new project
    * Only add complexity when Claude starts missing relevant code
    * Most projects never need embeddings
  </Accordion>

  <Accordion title="Upgrade When Needed">
    * Switch to retrieval funnel when search becomes slow/inaccurate
    * Usually happens around 200-500 files depending on structure
    * Monitor Claude's success rate in finding relevant code
  </Accordion>

  <Accordion title="Keep Agent in the Loop">
    * Even with embeddings, let Claude make final reading decisions
    * Provide candidates, not conclusions
    * Claude's reasoning beats pure similarity matching
  </Accordion>
</AccordionGroup>

## Performance Expectations

<Card title="Typical Performance" icon="chart-line">
  <table>
    <thead>
      <tr>
        <th>Repository Size</th>
        <th>Approach</th>
        <th>Search Time</th>
        <th>Success Rate</th>
      </tr>
    </thead>

    <tbody>
      <tr>
        <td>\< 200 files</td>
        <td>Agent Search</td>
        <td>\< 5 seconds</td>
        <td>95%+</td>
      </tr>

      <tr>
        <td>200-1000 files</td>
        <td>Basic Funnel</td>
        <td>\< 10 seconds</td>
        <td>90%+</td>
      </tr>

      <tr>
        <td>1000+ files</td>
        <td>Advanced Funnel</td>
        <td>\< 15 seconds</td>
        <td>85%+</td>
      </tr>
    </tbody>
  </table>
</Card>

**The key insight**: Most code retrieval problems are solved by giving Claude the right navigation tools, not by throwing embeddings at everything.

Ready to implement? [Get your API key](https://morphllm.com/api-keys) for repositories that need the retrieval funnel, or just start with agent search for everything else.
