Vound Colorado, Ltd. Knowledge Base - Intella Assist: OpenAI compatible chat completions request parameters

Overview

Intella Assist supports adjusting several parameters used in OpenAI-compatible Chat Completions API requests.

These parameters control aspects such as randomness, response length, and repetition behavior.

temperature

Type: Double (range 0.0 – 2.0)

Controls the randomness of generated responses.

A value of 0 produces deterministic results, 1 gives balanced variability, and higher values (up to 2) increase randomness and creativity.

top_p

Type: Double (range 0.0 – 1.0)

An alternative to the temperature parameter.

It uses “nucleus sampling” to limit responses to the top probability mass of tokens. In most cases, only one of temperature or top_p should be used.

max_tokens

Type: Integer (1 to model-specific limit)

Defines the maximum number of tokens (roughly words or pieces of words) that the model can generate in its response.

The limit depends on the model—for example, GPT-4-1106 supports approximately 128k tokens of context.

presence_penalty

Type: Double (range –2.0 to 2.0)

Encourages the model to introduce new topics.

Higher values reduce the likelihood of repeating earlier content and push the model to explore different ideas.

frequency_penalty

Type: Double (range –2.0 to 2.0)

Reduces repetition of identical words or phrases.

Higher values result in more varied and diverse language output.

To adjust these parameters, add one or more of the following keys to your user.prefs file:

IntellaAssistReqParamTemperature

IntellaAssistReqParamTopP

IntellaAssistReqParamMaxTokens

IntellaAssistReqParamPresencePenalty

IntellaAssistReqParamFrequencyPenalty

The user.prefs file is stored in one of the following locations, depending on the product you are using:

C:\Users\<USERNAME>\AppData\Roaming\Intella\prefs\user.prefs

C:\Users\<USERNAME>\AppData\Roaming\Intella Investigator\prefs\user.prefs

C:\Users\<USERNAME>\AppData\Roaming\Intella Connect\prefs\user.prefs

Notes

- Changes take effect after restarting the product.

- Use either temperature or top_p, but not both.

- Increasing max_tokens allows longer answers but may raise processing time and token usage.

- If these parameters are not defined, they are not included in the request sent to the LLM provider.

In that case, the provider’s own default values (if any) will apply automatically.