LLM Configuration
When crafting and testing prompts, your primary interaction with the Language Model (LLM) typically occurs via an API. Configuring specific parameters is crucial for obtaining varied results and enhancing the reliability and desirability of responses. Experimentation is key to determining the optimal settings for your specific use cases. Here are common settings encountered when working with different LLM providers:
-
Temperature: Essentially, a lower temperature results in more deterministic outcomes, where the highest probable next token is always selected. Conversely, increasing the temperature introduces more randomness, fostering diverse or creative outputs. Lower values are suitable for fact-based QA, encouraging factual and concise responses. For creative tasks like poem generation, higher temperature values may prove beneficial.
-
Top P: Employing nucleus sampling alongside temperature, Top P allows you to control the model's determinism. A low Top P is suitable for exact and factual answers, while higher values promote more diverse responses. Using Top P means that only tokens within the top_p probability mass are considered, making it useful for selecting confident responses. Adjusting either temperature or Top P is recommended, but not both.
-
Top K: In contrast to temperature and Top P, Top K is centered on choosing a predetermined number of tokens with the highest probabilities. The model assesses tokens based on their probabilities and picks the top K tokens. This methodology maintains a degree of determinism while permitting creative diversity. Modifying the K value enables the model to govern the quantity of tokens taken into account during the sampling process.
-
Max Length: Regulate the number of generated tokens by adjusting the max length. This helps prevent lengthy or irrelevant responses and aids in cost control.
-
Stop Sequences: Define strings that halt token generation, allowing control over response length and structure. For example, specifying "11" as a stop sequence can limit lists generated by the model to no more than 10 items.
-
Repetition Penalty: Repetition penalties serve to mitigate the model's inclination to not repeat prompt text or become stuck in a loop. This is achieved by modifying the token probabilities at each generation step, making tokens that have already appeared in the text (either in the prompt or completion) less likely to be generated again.