Reinforcement learning (RL)

Learning policies via rewards or scores from interacting with an environment: core to games, robotics, and some RLHF-style LLM tuning.

RL optimizes behaviors through trial signals: delayed rewards sparse, exploration vs. exploitation traps, simulations.

Modern AlphaGo-style game systems and LLM alignment sometimes use RL-style stages e.g., preference models RLHF with PPO alongside supervised learning logs.