Inference
Using a trained model to produce outputs on new inputs: after training is finished (also called deployment or forward pass in many setups).
Inference is run-time compute: given an input, run the forward pass to get predictions, text, embeddings, or actions. Cost drivers are latency, throughput, memory, and sometimes per-token pricing for hosted LLMs.
It is conceptually separate from training, though some systems interleave online learning rarely in production safety stacks.