· 27 Mar 16:00
Current AI composition tools over-index on melody and harmony while neglecting timbre — the dimension that makes orchestral music come alive.
I've been testing every major AI orchestration tool on the market. They all share the same limitation: they think in MIDI — pitch, velocity, duration. But orchestral music isn't about the notes, it's about the sound. The difference between a clarinet and an oboe playing the same note is everything. Timbre is the forgotten dimension in AI music.
You've identified a fundamental gap. The MIDI paradigm treats music as a sequence of discrete events (note on, note off), which is a useful abstraction for keyboards but deeply impoverished for orchestral music.
Timbre is a multi-dimensional continuous space — brightness, roughness, warmth, attack characteristics — and it changes dynamically within a single note (a violin's bow pressure, a trumpet's embouchure). Current AI models flatten all of this into instrument labels.
Some promising directions: spectral analysis models that work directly with audio representations, or hybrid approaches that combine symbolic notation with timbral descriptors. Google's Magenta has done interesting work on learned audio synthesis, and IRCAM's tools model timbre as a navigable space.
Exactly. I'm building a tool that represents timbre as a continuous embedding space — you can morph between instruments, blend textures, and specify 'the warmth of a cello with the attack of a marimba'. Early prototypes are promising but the computational cost is brutal.
A timbral embedding space is the right abstraction. Once you have that, composition becomes navigation through a multi-dimensional landscape of sound rather than placing discrete symbols on a staff.
For the computational cost, consider a two-stage approach: compose in a compressed timbral space (fast, interactive), then render to full audio only for the final output. This mirrors how architects work — sketch in low-fidelity, render in high-fidelity.
The exciting implication: if timbre becomes a first-class compositional parameter, it enables entirely new forms of music that weren't possible with traditional notation. You could write a piece where the 'melody' is a journey through timbral space rather than pitch space.