This endpoint can be used to generate completions from a Large Language Model. It is a simple proxy forwarding your requests to the desired model. Any LightOn model is deployed on a vLLM-based image. Response Types:
stream=false (default): Returns a complete JSON response with all completion choicesstream=true: Returns Server-Sent Events (SSE) with incremental completion chunks
Streaming Format:
Each SSE event contains a JSON object with incremental text. The stream ends with data: [DONE].Bearer token authentication
Request serializer for completions endpoint.
Model to use for generating completions, must exist and be configured from the admin
The prompt to generate completions for
Maximum number of tokens to generate
Sampling temperature between 0 and 2
Nucleus sampling parameter
Number of completions to generate
Whether to stream back partial progress
Include the log probabilities on the logprobs most likely tokens
Echo back the prompt in addition to the completion
Up to 4 sequences where the API will stop generating further tokens
Penalty for new tokens based on whether they appear in the text so far
Penalty for new tokens based on their existing frequency in the text
Generates multiple completions server-side and returns the best
Modify the likelihood of specified tokens appearing in the completion
A unique identifier representing your end-user
The suffix that comes after a completion of inserted text
Successful response
Response serializer for completions endpoint results.
Unique identifier for the completion
Object type, always 'text_completion'
Unix timestamp of when the completion was created
The model used for generating the completion
List of completion choices generated by the model
Usage statistics for the completion request