Context Selection for LLM Prompts

Effective context selection is crucial for getting the best results from large language models. This guide covers strategies for determining token limits and retrieving the most relevant context using Morph’s embeddings and reranking capabilities.

Determining Token Limits

The first step in context selection is deciding how many tokens to include in your prompts. There are two main approaches:

Fixed Token Limit

When using a fixed token limit approach, you allocate a predetermined number of tokens for context:

// Example with fixed context window
const MAX_CONTEXT_TOKENS = 50000; // For models with large context windows
const TOKENS_PER_MESSAGE = 4; // Overhead per message
const TOKENS_PER_NAME = 1; // Overhead for role name

function calculateAvailableContextTokens(promptTokens: number): number {
  const overheadTokens = TOKENS_PER_MESSAGE + TOKENS_PER_NAME;
  return MAX_CONTEXT_TOKENS - promptTokens - overheadTokens;
}

Pros:

Simple to implement
Predictable token usage and costs
Works well for standardized tasks

Cons:

Not optimized for complex tasks that need more context
May waste tokens for simpler tasks

Dynamic Token Limit

A dynamic approach adjusts the token limit based on task complexity:

// Example with dynamic context sizing
function calculateContextTokens(task: Task): number {
  let baseTokens;
  
  switch (task.complexity) {
    case 'simple':
      baseTokens = 8000;
      break;
    case 'medium':
      baseTokens = 16000;
      break;
    case 'complex':
      baseTokens = 32000;
      break;
    default:
      baseTokens = 16000;
  }
  
  // Adjust for additional factors
  if (task.requiresCodeUnderstanding) baseTokens += 8000;
  if (task.requiresMultiStepReasoning) baseTokens += 4000;
  
  return Math.min(baseTokens, 50000); // Cap at model's context limit
}

Pros:

More efficient token usage
Better results for complex tasks
Can optimize for cost on simpler queries

Cons:

More complex to implement
Requires task complexity estimation

Retrieving Relevant Context

Once you’ve determined your token budget, the next step is selecting the most relevant information for your context window.

Using Morph Embeddings

Morph provides an embeddings endpoint that creates vector representations of text, which can be used for similarity search:

import OpenAI from 'openai';

// Initialize the OpenAI client with Morph's API endpoint
const openai = new OpenAI({
  apiKey: 'your-morph-api-key',
  baseURL: 'https://api.morphllm.com/v1'
});

async function getEmbeddings(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: "morph-embedding-v2",
    input: text,
  });
  
  return response.data[0].embedding;
}

// Example: Get embeddings for code chunks
async function embeddCodeChunks(codeChunks: string[]): Promise<{text: string, embedding: number[]}[]> {
  const results = [];
  
  for (const chunk of codeChunks) {
    const embedding = await getEmbeddings(chunk);
    results.push({
      text: chunk,
      embedding
    });
  }
  
  return results;
}

// Store these embeddings in a vector database of your choice

Using Morph Reranking

After retrieving similar documents with embeddings, use Morph’s reranking to get the most relevant results:

// Using fetch with Morph's reranking endpoint
async function rerankDocuments(query: string, documents: string[]): Promise<{document: string, relevanceScore: number}[]> {
  const response = await fetch('https://api.morphllm.com/v1/rerank', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer your-morph-api-key`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      query,
      documents,
      top_n: documents.length // Get all documents ranked
    })
  });
  
  const result = await response.json();
  return result.results.map((item: any) => ({
    document: documents[item.index],
    relevanceScore: item.relevance_score
  }));
}

Basic Retrieval Strategy

Here’s a basic implementation that combines embedding search with reranking:

import OpenAI from 'openai';
import { VectorDB } from 'your-vector-db'; // Replace with your vector database

const openai = new OpenAI({
  apiKey: 'your-morph-api-key',
  baseURL: 'https://api.morphllm.com/v1'
});

const vectorDb = new VectorDB(); // Initialize your vector DB

async function retrieveContext(query: string, maxTokens: number): Promise<string[]> {
  // Step 1: Get query embedding
  const queryEmbedding = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: query,
  });
  
  // Step 2: Find top 10 similar documents
  const similarDocs = await vectorDb.similaritySearch(
    queryEmbedding.data[0].embedding,
    10
  );
  
  // Step 3: Rerank the documents
  const rerankedDocs = await fetch('https://api.morphllm.com/v1/rerank', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer your-morph-api-key`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      query,
      documents: similarDocs.map(doc => doc.text),
      top_n: similarDocs.length
    })
  }).then(res => res.json());
  
  // Step 4: Select top documents that fit within token limit
  let tokenCount = 0;
  const selectedDocs: string[] = [];
  
  for (const result of rerankedDocs.results) {
    const doc = similarDocs[result.index];
    const docTokens = estimateTokens(doc.text);
    
    if (tokenCount + docTokens <= maxTokens) {
      selectedDocs.push(doc.text);
      tokenCount += docTokens;
    } else {
      break;
    }
  }
  
  return selectedDocs;
}

// Helper function to estimate tokens (actual implementation depends on your tokenizer)
function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4); // Simple approximation
}

Advanced Retrieval Strategy

For even better results, use a specialized query generated by an LLM:

async function advancedContextRetrieval(userQuery: string, maxTokens: number): Promise<string[]> {
  // Step 1: Generate an optimized lookup query using 4o
  const lookupQueryResponse = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: "You are a query optimization expert. Generate the best possible search query to find relevant information based on the user's question."
      },
      {
        role: "user",
        content: `Generate a search query to find information about: ${userQuery}`
      }
    ]
  });
  
  const optimizedQuery = lookupQueryResponse.choices[0].message.content;
  
  // Step 2: Get embeddings for the optimized query
  const queryEmbedding = await openai.embeddings.create({
    model: "morph-embedding-v2",
    input: optimizedQuery,
  });
  
  // Step 3: Find top 20 similar documents (larger initial set)
  const similarDocs = await vectorDb.similaritySearch(
    queryEmbedding.data[0].embedding,
    20
  );
  
  // Step 4: Rerank with the original query for better precision
  const rerankedDocs = await fetch('https://api.morphllm.com/v1/rerank', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer your-morph-api-key`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      query: userQuery, // Use original query for reranking
      documents: similarDocs.map(doc => doc.text),
      top_n: similarDocs.length
    })
  }).then(res => res.json());
  
  // Step 5: Select documents that fit within token limit
  let tokenCount = 0;
  const selectedDocs: string[] = [];
  
  for (const result of rerankedDocs.results) {
    const doc = similarDocs[result.index];
    const docTokens = estimateTokens(doc.text);
    
    if (tokenCount + docTokens <= maxTokens) {
      selectedDocs.push(doc.text);
      tokenCount += docTokens;
    } else {
      break;
    }
  }
  
  return selectedDocs;
}

Monitoring and Re-embedding with Morph SDK

To keep your embeddings fresh, use Morph’s SDK to monitor file changes and trigger re-embeddings:

import { MorphSDK } from '@morphllm/sdk';
import { watch } from 'fs';

const morph = new MorphSDK({
  apiKey: 'your-morph-api-key'
});

// Watch for file changes in your codebase
function setupFileWatcher(directory: string): void {
  watch(directory, { recursive: true }, async (eventType, filename) => {
    if (filename && (filename.endsWith('.js') || filename.endsWith('.ts'))) {
      console.log(`File changed: ${filename}`);
      
      // Read the file content
      const content = fs.readFileSync(`${directory}/${filename}`, 'utf-8');
      
      // Generate new embedding
      const embedding = await morph.embeddings.create({
        model: "text-embedding-3-small",
        input: content
      });
      
      // Update your vector database
      await vectorDb.updateEmbedding(filename, embedding.data[0].embedding, content);
    }
  });
}

Performance Comparison

Testing shows that the advanced retrieval method typically yields significant improvements:

Retrieval Strategy	Accuracy	Response Quality	Relative Performance
Basic Similarity	Good	Moderate	Baseline
Basic + Reranking	Better	Good	+10%
Advanced Method	Best	Excellent	+20%

Best Practices

Chunk your content intelligently: Split documents at logical boundaries like functions or paragraphs.
Balance precision vs. recall: Start with a larger set of documents before reranking to ensure you don’t miss relevant information.
Consider document diversity: Include different types of context (code, documentation, examples) for more comprehensive responses.
Monitor and refine: Track which contexts lead to better responses and adjust your strategy accordingly.
Use domain-specific embeddings: When available, use embeddings trained on code or your specific domain.

For access to our latest models, self-hosting, or business inquiries, please contact us at info@morphllm.com.

​Context Selection for LLM Prompts

​Determining Token Limits

​Retrieving Relevant Context

​Using Morph Embeddings

​Using Morph Reranking

​Basic Retrieval Strategy

​Advanced Retrieval Strategy

​Monitoring and Re-embedding with Morph SDK

​Performance Comparison

​Best Practices