Creates model responses based on conversation history. Supports both streaming and non-streaming responses.
Compatible with the OpenAI Chat Completions API.
Authentication is done using Bearer Token.
Format: Authorization: Bearer sk-xxxxxx
model ID
message list
What sampling temperature to use, between 0 and 2.
0 <= x <= 2An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
0 <= x <= 1How many chat completion choices to generate for each input message.
x >= 1If set to true, the model response data will be streamed to the client as it is generated using server-sent events.
Up to 4 sequences where the API will stop generating further tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
-2 <= x <= 2Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
-2 <= x <= 2none, auto, required low, medium, high text, audio