Chat Completions
Create Chat Completion
POST v1/chat/completions
Generate a chat message completion using the chosen Language Model (LLM).
The request body follows a format similar to OpenAI's chat completion request (opens in a new tab), and the response will be the chat completion object (opens in a new tab). When opting for stream:true, the response will manifest as a stream of chat completion Chunk (opens in a new tab) objects. SimpliML automatically adapts the parameters for LLMs other than OpenAI supported. In case certain parameters are absent in these LLMs, they will be excluded.
Request Schema
Option | Type | Description | Default | Required |
---|---|---|---|---|
messages | array | A list of messages comprising the conversation so far. The conversation should be with alternative roles as system - user - assistant - user | - | Yes |
model | string | ID of the model to use. | - | Yes |
max_tokens | number | The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. | 100 | No |
temperature | float | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. | 1.0 | No |
top_k | number | What top_k to use between 1 to 50. Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens. | 40 | No |
top_p | float | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. | 0.92 | No |
repetetion_penalty | float | Float Number that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens. | 1.0 | No |
stream | boolean | If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available | false | No |