Explaining Reinforcement Learning with Human Feedback with Star Trek

Microsoft announced today that it will include results from a Large Language Model based on GPT-3 in Bing results. They will also release a new version of the Edge browser that will include a ChatGPT-like bot. 

GPT-3 has been around for almost two years. What has caused this sudden leap forward in the capabilities of Large Language Models 🤔?

The answer is – *Reinforcement Learning From Human Feedback* or RLHF. 

By combining the capabilities of a large language model with those of another model trained on the end-users preferences, we end up with the uncannily accurate results that ChatGPT seems to produce.

Ok – but how does RLHF work? Let me try and explain with a (ridiculous) analogy. 

In the Star Trek series, the Replicator is a device that can produce pretty much anything on demand. 

When Captain Picard says, “Tea, Earl Grey, Hot!” it produces the perfect cup of tea. But how might you train a Replicator? With RLHF, of course!

Explaining RLHF

Let’s see how:

1. Feed the Replicator with all the beverage recipes in the known universe.

2. Train it to try and predict what a recipe would be when given a prompt. I.e. when a user says “Tea, Earl Gray, Hot!” – it should be able to predict what goes into the beverage.

3. Train *another* model – let’s call it the “Tea Master 2000” with Captain Picard’s preferences. 

4. When the Replicator generates a beverage, the Tea Master responds with a score. +10 for a perfect cup of tea, -10 for mediocre swill. 

5. We now use Reinforcement Learning (RL) to optimize the Replicator to get a perfect ten score. 

6. After much optimization, the Replicator can generate the perfect cup of tea – tuned to Captain Picard’s preferences.

If you substitute the Replicator with an LLM like GPT-3, and substitute the Tea Master with another ML model called the *Preference* model, then you have seen RLHF in action! 

It is a lot more complicated, but I will take any opportunity to generate Star Trek TNG-themed content 🖖.

Further Reading

Hugging Face has a fantastic blog post explaining RLHF in detail: https://huggingface.co/blog/rlhf

For those more visually inclined, Hugging Face also has a YouTube video about RLHF: https://www.youtube.com/live/2MBJOuVq380?feature=share

Anthropic AI has a paper that goes into a lot of detail on how they use RLHF to train their AI Assistant: https://arxiv.org/abs/2204.05862