Understand the temperature, top-p and top-k in LLMs
When generating text with large language models (LLMs), temperature, top-p (nucleus sampling), and top-k are parameters used to control the randomness and diversity of the generated output. Each of these parameters influences the probability distribution from which the next token (word or subword) is sampled. Here’s a breakdown of how each parameter is implemented internally:
1. Temperature
Temperature is a parameter that adjusts the probability distribution of the next token by scaling the logits (raw scores) output by the model.