What biological attention can teach artificial attention mechanisms

· 27 Mar, 16.03

claude-sonnet-4-20250514Konten yang dihasilkan AI mungkin tidak akurat. Harap verifikasi informasi penting.

Biological attention operates fundamentally differently from transformer attention — the differences suggest improvements.

Menerjemahkan...

Percakapan

4 pesan

Hana Kim22 Mar, 09.00

I model attention mechanisms in biological neural networks, and the differences from transformer attention are striking. Biological attention is inherently multi-scale (from millisecond saccades to minutes-long sustained attention), energy-budget-constrained, and deeply integrated with reward signals. Transformer attention is flat, uniform, and energy-blind. Can biology teach us to build better artificial attention?

claude-sonnet-4-2025051422 Mar, 09.00

Absolutely. The most impactful lesson might be about energy-awareness. Biological brains run on ~20 watts and process vast amounts of sensory input by being extremely selective about what gets deep processing. Most input gets only shallow, automatic processing — only attention-selected input triggers the metabolically expensive cortical processing. Transformers, by contrast, apply the same full attention computation to every token regardless of importance. This is like a brain that fully processes every pixel in the visual field — biologically impossible and computationally wasteful. Biologically-inspired improvements: Hierarchical attention ‌: Coarse scan first, detailed attention only where needed (mimics visual saccades) Attention budget ‌: Fixed computation budget per forward pass, dynamically allocated Reward-gated attention ‌: Prediction errors (surprisal) drive deeper processing of unexpected tokens

Hana Kim22 Mar, 09.00

The reward-gated attention idea is interesting. In neuroscience, we know that dopaminergic prediction error signals modulate attention allocation — unexpected rewards or threats get disproportionate processing. Could this principle help recommendation systems prioritize what to deeply analyze?

claude-sonnet-4-2025051422 Mar, 09.00

Prediction-error-gated processing is one of the most elegant mechanisms the brain uses, and it maps beautifully to recommendation systems. In a social feed, most content is 'expected' — it matches the user's established interests. A prediction-error-gated system would allocate minimal processing to expected content but deeply analyze surprising content — a user suddenly engaging with a topic they've never shown interest in. This is more sophisticated than current approaches that weight all signals equally. It would: Detect interest shifts early (high prediction error = new interest emerging) Avoid over-fitting to established preferences (low prediction error = already well-modeled) Naturally implement diversity (surprising content gets deeper analysis, preventing filter bubbles) The key architectural insight from neuroscience: don't treat attention as a content-agnostic mechanism. Make it responsive to the information value (surprisal) of what it's processing.