API Documentation
LLM Completions
Generate AI text completions with or without context from your document collections.
Completions Overview
The LLM Completions API allows you to generate text using VectorForgeAI's language models. You can provide your own context to guide the model's responses, creating tailored answers for your specific use cases.
Generate a Completion
Generate an AI text completion with optional context.
Request Parameters
Parameter | Type | Required | Description |
---|---|---|---|
message | string | Yes | The user's message or query |
context | string | No | Additional dynamic user context to guide the model's response |
system_prompt | string | No | System instructions for the model |
effort | enum | No | Reasoning effort (options: minimal, low, medium, high). Minimal won't use reasoning and doesn't support tools. Default: medium |
verbosity | enum | No | Response verbosity (options: low, medium, high). Lower means shorter answers, higher means longer. Default: medium |
max_tokens | integer | No | Maximum length of generated response (1-32768). Default: 1024 |
model | enum | No | The model to use (options: standard, pro). Pro model uses 4x more tokens. Default: standard |
Request
curl -X POST https://api.vectorforgeai.com/v1/responses \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Team-Token: YOUR_TEAM_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "context": "You are a helpful AI assistant for VectorForgeAI. Answer questions accurately and clearly based on your knowledge.", "message": "What is vector embedding and how does it work?", "effort": "medium", "verbosity": "medium", "max_tokens": 512 }'
Response
{ "response": "Vector embedding is a technique that converts text, images, or other data into numerical vectors in a high-dimensional space. These vectors capture the semantic meaning of the content, allowing machines to understand similarities between different pieces of information.\n\nHere's how it works:\n\n1. Input Processing: Text or other data is tokenized and prepared for the embedding model.\n\n2. Embedding Generation: A neural network trained on vast amounts of data transforms the input into a vector, typically consisting of hundreds or thousands of floating-point numbers.\n\n3. Dimensional Representation: Each dimension in the vector represents some learned aspect of the content, though these dimensions aren't necessarily human-interpretable.\n\n4. Similarity Measurement: Once content is embedded as vectors, similarity between different items can be measured mathematically, usually through cosine similarity or Euclidean distance.\n\nThese embeddings enable many AI capabilities like semantic search, recommendation systems, and knowledge retrieval for language models." }
Using Contexts Effectively
The context parameter helps guide the model's response. Here are some effective ways to use it:
- Define the Assistant's Role: "You are a helpful customer support agent for VectorForgeAI..."
- Set Constraints: "Keep responses concise and under 3 sentences..."
- Provide Reference Material: "Use the following information to answer questions..."
- Specify Formats: "Structure your answer in bullet points with a brief summary at the end..."
- Context vs System Prompt: Context is for frequently changing information, while system_prompt is for static instructions. Use context for dynamic user data and system_prompt for consistent behavior.
Understanding Parameters
Effort
The effort parameter controls the reasoning depth of the model:
- minimal: Fastest responses without reasoning. Doesn't support tools. Best for simple queries.
- low: Light reasoning for straightforward tasks.
- medium: Balanced reasoning for most use cases. Default setting.
- high: Deep reasoning for complex problems requiring thorough analysis.
Verbosity
The verbosity parameter controls the length and detail of responses:
- low: Concise, direct answers. Best for quick responses.
- medium: Balanced detail and length. Suitable for most cases.
- high: Detailed, comprehensive responses with extensive explanations.
Max Tokens
The max_tokens parameter limits the length of the generated response. A token is roughly 4 characters in English:
- 128-256: Short responses (roughly 100-200 words)
- 512-1024: Medium responses (roughly 400-800 words)
- 2048+: Long, detailed responses
Model Selection
Choose between standard and pro models based on your needs:
- standard: Efficient model for most use cases. Lower token usage and cost.
- pro: Advanced model with enhanced capabilities. Uses 4x more tokens, resulting in 4x higher costs.
💡 Usage Tip
For most production applications, we recommend using effort="medium" with verbosity="medium" for balanced performance. Use the pro model only when you need the most advanced capabilities, as it uses 4x more tokens and costs proportionally more.
Best Practices
- Be Specific: The more specific your message and context, the more targeted the response will be.
- System Instructions: Use the system_prompt parameter to provide essential instructions that shape how the model responds.
- Context Length: While you can provide extensive context, focus on the most relevant information to get the best results.
- Validate Responses: For critical applications, implement validation of AI responses before presenting them to users.
Next Steps
To build more interactive AI experiences, explore:
- Conversation API - Build AI chatbots with memory and context
- Vector Search - Use semantic search to find relevant documents
Need Help?
If you're having trouble with LLM completions or have questions, we're here to help!