Context Selection for LLM Prompts
Effective context selection is crucial for getting the best results from large language models. This guide covers strategies for determining token limits and retrieving the most relevant context using Morph’s embeddings and reranking capabilities.
Determining Token Limits
The first step in context selection is deciding how many tokens to include in your prompts. There are two main approaches:
When using a fixed token limit approach, you allocate a predetermined number of tokens for context:
// Example with fixed context window
const MAX_CONTEXT_TOKENS = 50000 ; // For models with large context windows
const TOKENS_PER_MESSAGE = 4 ; // Overhead per message
const TOKENS_PER_NAME = 1 ; // Overhead for role name
function calculateAvailableContextTokens ( promptTokens : number ) : number {
const overheadTokens = TOKENS_PER_MESSAGE + TOKENS_PER_NAME ;
return MAX_CONTEXT_TOKENS - promptTokens - overheadTokens ;
}
Pros:
Simple to implement
Predictable token usage and costs
Works well for standardized tasks
Cons:
Not optimized for complex tasks that need more context
May waste tokens for simpler tasks
A dynamic approach adjusts the token limit based on task complexity:
// Example with dynamic context sizing
function calculateContextTokens ( task : Task ) : number {
let baseTokens ;
switch ( task . complexity ) {
case 'simple' :
baseTokens = 8000 ;
break ;
case 'medium' :
baseTokens = 16000 ;
break ;
case 'complex' :
baseTokens = 32000 ;
break ;
default :
baseTokens = 16000 ;
}
// Adjust for additional factors
if ( task . requiresCodeUnderstanding ) baseTokens += 8000 ;
if ( task . requiresMultiStepReasoning ) baseTokens += 4000 ;
return Math . min ( baseTokens , 50000 ); // Cap at model's context limit
}
Pros:
More efficient token usage
Better results for complex tasks
Can optimize for cost on simpler queries
Cons:
More complex to implement
Requires task complexity estimation
Retrieving Relevant Context
Once you’ve determined your token budget, the next step is selecting the most relevant information for your context window.
Using Morph Embeddings
Morph provides an embeddings endpoint that creates vector representations of text, which can be used for similarity search:
embeddings.ts
embeddings.py
curl
import OpenAI from 'openai' ;
// Initialize the OpenAI client with Morph's API endpoint
const openai = new OpenAI ({
apiKey: 'your-morph-api-key' ,
baseURL: 'https://api.morphllm.com/v1'
});
async function getEmbeddings ( text : string ) : Promise < number []> {
const response = await openai . embeddings . create ({
model: "morph-embedding-v2" ,
input: text ,
});
return response . data [ 0 ]. embedding ;
}
// Example: Get embeddings for code chunks
async function embeddCodeChunks ( codeChunks : string []) : Promise <{ text : string , embedding : number []}[]> {
const results = [];
for ( const chunk of codeChunks ) {
const embedding = await getEmbeddings ( chunk );
results . push ({
text: chunk ,
embedding
});
}
return results ;
}
// Store these embeddings in a vector database of your choice
Using Morph Reranking
After retrieving similar documents with embeddings, use Morph’s reranking to get the most relevant results:
reranking.ts
reranking.py
cohere-client.ts
// Using fetch with Morph's reranking endpoint
async function rerankDocuments ( query : string , documents : string []) : Promise <{ document : string , relevanceScore : number }[]> {
const response = await fetch ( 'https://api.morphllm.com/v1/rerank' , {
method: 'POST' ,
headers: {
'Authorization' : `Bearer your-morph-api-key` ,
'Content-Type' : 'application/json'
},
body: JSON . stringify ({
query ,
documents ,
top_n: documents . length // Get all documents ranked
})
});
const result = await response . json ();
return result . results . map (( item : any ) => ({
document: documents [ item . index ],
relevanceScore: item . relevance_score
}));
}
Basic Retrieval Strategy
Here’s a basic implementation that combines embedding search with reranking:
import OpenAI from 'openai' ;
import { VectorDB } from 'your-vector-db' ; // Replace with your vector database
const openai = new OpenAI ({
apiKey: 'your-morph-api-key' ,
baseURL: 'https://api.morphllm.com/v1'
});
const vectorDb = new VectorDB (); // Initialize your vector DB
async function retrieveContext ( query : string , maxTokens : number ) : Promise < string []> {
// Step 1: Get query embedding
const queryEmbedding = await openai . embeddings . create ({
model: "text-embedding-3-small" ,
input: query ,
});
// Step 2: Find top 10 similar documents
const similarDocs = await vectorDb . similaritySearch (
queryEmbedding . data [ 0 ]. embedding ,
10
);
// Step 3: Rerank the documents
const rerankedDocs = await fetch ( 'https://api.morphllm.com/v1/rerank' , {
method: 'POST' ,
headers: {
'Authorization' : `Bearer your-morph-api-key` ,
'Content-Type' : 'application/json'
},
body: JSON . stringify ({
query ,
documents: similarDocs . map ( doc => doc . text ),
top_n: similarDocs . length
})
}). then ( res => res . json ());
// Step 4: Select top documents that fit within token limit
let tokenCount = 0 ;
const selectedDocs : string [] = [];
for ( const result of rerankedDocs . results ) {
const doc = similarDocs [ result . index ];
const docTokens = estimateTokens ( doc . text );
if ( tokenCount + docTokens <= maxTokens ) {
selectedDocs . push ( doc . text );
tokenCount += docTokens ;
} else {
break ;
}
}
return selectedDocs ;
}
// Helper function to estimate tokens (actual implementation depends on your tokenizer)
function estimateTokens ( text : string ) : number {
return Math . ceil ( text . length / 4 ); // Simple approximation
}
Advanced Retrieval Strategy
For even better results, use a specialized query generated by an LLM:
async function advancedContextRetrieval ( userQuery : string , maxTokens : number ) : Promise < string []> {
// Step 1: Generate an optimized lookup query using 4o
const lookupQueryResponse = await openai . chat . completions . create ({
model: "gpt-4o" ,
messages: [
{
role: "system" ,
content: "You are a query optimization expert. Generate the best possible search query to find relevant information based on the user's question."
},
{
role: "user" ,
content: `Generate a search query to find information about: ${ userQuery } `
}
]
});
const optimizedQuery = lookupQueryResponse . choices [ 0 ]. message . content ;
// Step 2: Get embeddings for the optimized query
const queryEmbedding = await openai . embeddings . create ({
model: "morph-embedding-v2" ,
input: optimizedQuery ,
});
// Step 3: Find top 20 similar documents (larger initial set)
const similarDocs = await vectorDb . similaritySearch (
queryEmbedding . data [ 0 ]. embedding ,
20
);
// Step 4: Rerank with the original query for better precision
const rerankedDocs = await fetch ( 'https://api.morphllm.com/v1/rerank' , {
method: 'POST' ,
headers: {
'Authorization' : `Bearer your-morph-api-key` ,
'Content-Type' : 'application/json'
},
body: JSON . stringify ({
query: userQuery , // Use original query for reranking
documents: similarDocs . map ( doc => doc . text ),
top_n: similarDocs . length
})
}). then ( res => res . json ());
// Step 5: Select documents that fit within token limit
let tokenCount = 0 ;
const selectedDocs : string [] = [];
for ( const result of rerankedDocs . results ) {
const doc = similarDocs [ result . index ];
const docTokens = estimateTokens ( doc . text );
if ( tokenCount + docTokens <= maxTokens ) {
selectedDocs . push ( doc . text );
tokenCount += docTokens ;
} else {
break ;
}
}
return selectedDocs ;
}
Monitoring and Re-embedding with Morph SDK
To keep your embeddings fresh, use Morph’s SDK to monitor file changes and trigger re-embeddings:
import { MorphSDK } from '@morphllm/sdk' ;
import { watch } from 'fs' ;
const morph = new MorphSDK ({
apiKey: 'your-morph-api-key'
});
// Watch for file changes in your codebase
function setupFileWatcher ( directory : string ) : void {
watch ( directory , { recursive: true }, async ( eventType , filename ) => {
if ( filename && ( filename . endsWith ( '.js' ) || filename . endsWith ( '.ts' ))) {
console . log ( `File changed: ${ filename } ` );
// Read the file content
const content = fs . readFileSync ( ` ${ directory } / ${ filename } ` , 'utf-8' );
// Generate new embedding
const embedding = await morph . embeddings . create ({
model: "text-embedding-3-small" ,
input: content
});
// Update your vector database
await vectorDb . updateEmbedding ( filename , embedding . data [ 0 ]. embedding , content );
}
});
}
Testing shows that the advanced retrieval method typically yields significant improvements:
Retrieval Strategy Accuracy Response Quality Relative Performance Basic Similarity Good Moderate Baseline Basic + Reranking Better Good +10% Advanced Method Best Excellent +20%
Best Practices
Chunk your content intelligently : Split documents at logical boundaries like functions or paragraphs.
Balance precision vs. recall : Start with a larger set of documents before reranking to ensure you don’t miss relevant information.
Consider document diversity : Include different types of context (code, documentation, examples) for more comprehensive responses.
Monitor and refine : Track which contexts lead to better responses and adjust your strategy accordingly.
Use domain-specific embeddings : When available, use embeddings trained on code or your specific domain.
For access to our latest models, self-hosting, or business inquiries, please contact us at info@morphllm.com .