Aurora AITell us your case

Offering

ServicesProductsCase studies

For whom

Private EquityEnterpriseSMB
ServicesProductsCase studiesAboutBlogContact

Knowledge base

Start hereWikiGlossaryGuides

AI Glossary

RLHF

Reinforcement Learning from Human Feedback, reinforcement learning from human feedback

RLHF is a model fine-tuning method in which people rate its responses and the model learns to prefer the higher-rated ones. This makes it more helpful and more aligned with users' expectations.

RLHF is a fine-tuning stage that follows the initial training of a large language model. People rate or compare the model's responses, and from those ratings a reward model is built. The language model is then fine-tuned so that it more often generates responses people would rate highly.

The goal is to align the model's behavior with users' expectations: clearer, safer and more helpful answers. RLHF does not, however, eliminate hallucinations, nor does it serve to add new factual knowledge — it relies on human judgment (human-in-the-loop) collected beforehand and applied on top of already-learned patterns.

Related terms