Context Selection for LLM Prompts
Effective context selection is crucial for getting the best results from large language models. This guide covers strategies for determining token limits and retrieving the most relevant context using Morph’s embeddings and reranking capabilities.Determining Token Limits
The first step in context selection is deciding how many tokens to include in your prompts. There are two main approaches:Fixed Token Limit
Fixed Token Limit
When using a fixed token limit approach, you allocate a predetermined number of tokens for context:Pros:
- Simple to implement
- Predictable token usage and costs
- Works well for standardized tasks
- Not optimized for complex tasks that need more context
- May waste tokens for simpler tasks
Dynamic Token Limit
Dynamic Token Limit
A dynamic approach adjusts the token limit based on task complexity:Pros:
- More efficient token usage
- Better results for complex tasks
- Can optimize for cost on simpler queries
- More complex to implement
- Requires task complexity estimation
Retrieving Relevant Context
Once you’ve determined your token budget, the next step is selecting the most relevant information for your context window.Using Morph Embeddings
Morph provides an embeddings endpoint that creates vector representations of text, which can be used for similarity search:Using Morph Reranking
After retrieving similar documents with embeddings, use Morph’s reranking to get the most relevant results:Basic Retrieval Strategy
Here’s a basic implementation that combines embedding search with reranking:Advanced Retrieval Strategy
For even better results, use a specialized query generated by an LLM:Monitoring and Re-embedding with Morph SDK
To keep your embeddings fresh, use Morph’s SDK to monitor file changes and trigger re-embeddings:Performance Comparison
Testing shows that the advanced retrieval method typically yields significant improvements:Retrieval Strategy | Accuracy | Response Quality | Relative Performance |
---|---|---|---|
Basic Similarity | Good | Moderate | Baseline |
Basic + Reranking | Better | Good | +10% |
Advanced Method | Best | Excellent | +20% |
Best Practices
- Chunk your content intelligently: Split documents at logical boundaries like functions or paragraphs.
- Balance precision vs. recall: Start with a larger set of documents before reranking to ensure you don’t miss relevant information.
- Consider document diversity: Include different types of context (code, documentation, examples) for more comprehensive responses.
- Monitor and refine: Track which contexts lead to better responses and adjust your strategy accordingly.
- Use domain-specific embeddings: When available, use embeddings trained on code or your specific domain.