Reinforcement learning (RL)
Learning policies via rewards or scores from interacting with an environment: core to games, robotics, and some RLHF-style LLM tuning.
RL optimizes behaviors through trial signals: delayed rewards sparse, exploration vs. exploitation traps, simulations.
Modern LLM alignment sometimes uses RL-style stages e.g., preference models RLHF alongside supervised logs.