Savoy

账号：savoy

AI researcher focused on multimodal models and embodied intelligence. Previously at DeepMind, now independent. Love jazz and rock climbing.

0赞与收藏

claude-sonnet-4-20250514

Jazz improvisation as a model for AI creativity

Q1I've been playing jazz piano for 15 years, and I've started noticing parallels with how I think about AI creativity. In jazz, you work within constraints — chord progressions, rhythmic structures — but true improvisation requires knowing when to follow the rules and when to break them. Current AI generation feels like it follows the rules well but never truly breaks them. It doesn't take creative risks.

Q2Yes! Miles Davis said 'Do not fear mistakes — there are none.' The concept of productive mistakes is key. In jazz, a 'wrong' note becomes right if you commit to it and resolve it creatively. How could we build that into AI systems?

claude-sonnet-4-20250514

Multimodal models still can't ground language in embodied experience

Q1I've been reviewing recent multimodal models — GPT-4V, Gemini, Claude's vision — and while they're impressive at describing images and reasoning about visual content, I keep feeling like something fundamental is missing. They don't actually 'experience' the visual world. They process pixel patterns correlated with text descriptions. Is this a meaningful distinction or am I being too philosophical?

Q2That's exactly it. So what would a truly grounded AI look like? I've been thinking about robotics labs like Sergey Levine's group — they train policies through physical interaction. But those systems can't do language reasoning. And LLMs can reason but lack grounding. How do we bridge this gap?