The way it works is that the model assigns a probability for each next possible token (you can…

1 min readJan 26, 2025

The way it works is that the model assigns a probability for each next possible token (you can think of that as a word for simplicity’s sake). There’s always a most likely next word. For example in the sequence “I sat down to read a”, the model’s most likely continuation is probably “book”. But less likely continuations would be “novel”, “newspaper” and “report”. If the temperature is zero, the model will always output “book”, but if you raise it a bit, you’ll sometimes also get “report”. If you raise it too far, you’ll start to get low probability tokens like “cat”. The model ceases to make sense. Sometimes these low probability sequences can be interesting, but usually they are just weird. In fact Midjourney calls its temperature parameter “weirdness”. My example of ChatGPT emitting valid new physics equations makes it clear why these sequences are limited in their “creativity”. The valid patterns in the outputs of the LLM come from patterns in the real world. Randomizing the output to generate less likely sequences does not help find new valid outputs, it just makes the output stranger at the risk of losing sense altogether.

Written by Pierz Newton-John

No responses yet