How LLM Settings Affect Prompt Engineering
Blog post from Vectorize
Communicating with a language model (LLM) when creating and testing prompts typically involves using an API and adjusting several parameters to achieve desired outcomes, with these adjustments requiring some trial and error. Key settings include "temperature," which affects the predictability versus creativity of responses by adjusting the likelihood of token selection; "top_p," a nucleus sampling method that influences the determinism of responses by considering tokens within a specific probability mass; and "max length," which controls the number of tokens the model generates to avoid excessively long outputs. Additionally, "stop sequences" can terminate responses upon reaching certain strings, while "frequency penalty" and "presence penalty" reduce the repetition of words by penalizing frequently or repeatedly used tokens, with a general recommendation to adjust either frequency or presence penalties, but not both. The effectiveness of these settings can vary depending on the LLM version used, emphasizing the need for experimentation to tailor responses for specific tasks like factual quality assurance or creative writing.