Stephen Wolfram’s longform article on ChatGPT is extremely informative for those looking to go a bit deeper. I’ve summarized the key points from that article below.
- ChatGPT is essentially trying to understand the meaning of a prompt and come up with a probable next word to add to the prompt
- Underpinning ChatGPT is a Large Language Model that attempts to estimate the probabilities of the word that should come after the prompt
- The ChatGPT model is trained using unsupervised learning, by feeding it billions of pieces of text and masking the final word, asking it to guess the word. The model’s parameters and loss functions are continually fine-tuned, improving its guess
- ChatGPT’s model also uses something called Embedding, where words, phrases and entire prompts are grouped according to their meaning space (e.g. alligators and crocodiles have similar embedding scores)
- The ChatGPT neural net has a unique feature called a Transformer. The Transformer pays more “attention” to some part of the embedding of the prompt than others. Practically speaking, this allows it to get a better understanding of the meaning of the prompt.
- In addition to the training by feeding it text and then getting it to predict outputs, there is a human reinforced learning component that is applied as part of the training
- An interesting thing about ChatGPT is how little specific prompting it needs to generate something specific e.g. “write a poem about Joe Biden’s reelection campaign in the voice of Lin Manuel Miranda”