Attention Mechanism
architectureA technique that allows the model to focus on different parts of the input sequence when generating each output token. It's the core innovation that enables models to understand context and relationships across long sequences.
Example:
When translating 'The cat sat on the mat', attention helps the model know that 'cat' should influence the translation of pronouns later in the sentence.
Think of it like:
Like a spotlight that can illuminate different parts of a stage - the model can 'pay attention' to relevant words when deciding what comes next.