The Washington PostDemocracy Dies in Darkness

AI is taking music lessons. They’re not going great.

For composers like Tod Machover, the future of AI and music depends on collaboration with human creativity.

Perspective by
Classical music critic
(Illustration by Jay Vollmar for The Washington Post)
10 min

I’ve heard the future and it sounds confused. A cubist explosion of a memory of swing music. Amorphous reggae of depoliticized gibberish. A Stevie Wonder song stuck in a satellite. Smooth jazz so smooth you can barely tell you’re listening to it.

These are just some of my notes after spending time listening to the output of new AI music platforms. (I’m not calling it music just yet.)

If you’ve been online at all the past six months, you’ve probably experienced the artificial intelligence (AI) boom in its many forms: image generators that transform banal selfies into heroic anime scenes or sumptuous “oil” paintings; deepfake videos that turn the tables on TV personality Simon Cowell; chatbots that freak you right out with their weird new sentience act.

I’d say the robots are coming for your music, but they’re already here. The concept of computers as composers has abruptly shifted from enduring fantasy to virtual reality.

AI music has found its way into the mainstream — you’ve probably heard a few seconds of it even today: soundtracking ads on YouTube and Facebook, or providing the emotional context of a TikTok video.

The tech is fast becoming part of the texture of popular culture: In 2019, TikTok parent company ByteDance purchased the AI music platform Jukedeck, which created tools that could let users alter music to match videos. In 2020, Shutterstock acquired “certain assets” of Amper, another AI music platform that auto-generated music based on selected parameters such as mood, length, tempo and instrumentation.

Skip to end of carousel
How we make art
AI image generators are trained to “understand” the content of hundreds of millions of images, usually scraped from the internet (possibly including yours), in order to create new images out of thin air. Try one out for yourself.
How we communicate
ChatGPT analyzes huge amounts of information to “write” natural-sounding text. (For example, you can ask it to do things like “write lyrics in the style of Eminem.”) It’s being used in many ways, despite issues of accuracy and bias.
The future
Experts predict the next frontiers of AI will include both more public-facing tools, products tailored to the needs of large corporations, military and medical applications as well as robots (including humanoids) doing a variety of work.
End of carousel

Other AI music services, including AIVA and Beatoven, offer game developers, podcasters and content creators simple, practical and — most important — royalty-free musical backdrops for their products.

And soon enough, auto-generated music may not automatically be relegated to the background: In 2019, Endel — an algorithmic music-generation app that its website says “takes in various internal and external inputs and creates an optimal environment for your current context, state, and goal” — became the “first-ever algorithm to sign [a] major label deal,” according to a press release.

This is not music to everyone’s ears.

This technology “is generating infinite music that isn’t actually composed by anybody, and that’s a terrible, scary, awful way of thinking about where music could go,” says composer Tod Machover. “I mean, really, it’s the worst kind of elevator music.”

The human touch is important to Machover. A musician, composer, inventor and professor, Machover heads the Opera of the Future group at the Massachusetts Institute of Technology’s Media Lab. Machover works extensively with AI in his work and in the beta-testing of ideas from his students. The research group focuses on the exploration of “concepts and techniques to help advance the future of musical composition, performance, learning, and expression” — and artificial intelligence is on everyone’s mind.

For Machover, AI represents a way to exponentially increase access to music and creative tools for making it. Since 1986, he has worked on the development of “hyperinstruments,” which employ sensors, signal processing and software not just to give a boost of musical power to virtuosic performers (like cellist Yo-Yo Ma), but also to build interactive musical instruments for nonprofessional musicians, students, music lovers and the public. He has also developed Hyperscore, described by Machover as a “graphic composition language for young people.”

Recently Machover created a 2.0 version of his 1996 “Brain Opera,” based on the ideas of the late theorist Marvin Minsky, a 20th-century mathematician, psychologist, lifelong pianist and one of the founding minds behind artificial intelligence. Minsky, who died in 2016, explored all these fascinations in his writings — notably 1981’s “Music, Mind and Meaning,” for which Machover recently wrote a postlude.

Machover’s ongoing series of “City Symphonies” uses AI to organize thousands of sounds that have been crowdsourced from residents of various cities. The sounds can be accessed through a special app, mixed (or “stirred”) with the swipe of a finger.

And with students at the MIT Media Lab, he has overseen projects such as an experimental radio station that plays only songs generated by AI, and a composition framework by doctoral student Manaswi Mishra that can be operated with one’s voice.

On March 7 in New York, Machover will premiere “Overstory Overture,” his first operatic work since 2018’s “Schoenberg in Hollywood.” Mezzo-soprano Joyce DiDonato will sing the 30-minute piece, commissioned and performed by the chamber ensemble Sejong Soloists. It is the first movement in a longer work Machover is composing based on Richard Powers’s Pulitzer Prize-winning 2018 novel “The Overstory,” a densely layered book about the intricately interconnected lives of trees and humans.

The novel’s rich colors and detailed evocations of the natural world inspired Machover to conjure the sounds of a forest as closely as possible. So he used AI to create a “language of trees” — a “low music” that evokes the root systems as well as more particulate sounds (think pollen) — and fed the AI system his own palette of electronic sounds and recordings of a detuned cello.

“I try to make models that are productive and useful and interesting and beautiful,” the composer says, “and I personally believe in a kind of collaboration between people and technology.”

For as long as humans have been making music, we’ve been trying to find ways for music to make itself.

At first we turned to nature: The Aeolian harps found across ancient civilizations required only the intervention of the wind to share their song. Millennia later, the mechanical organs of the 16th century would use the force of flowing water to draw breath into their bellows.

Then we started playing God: In 1736, the French inventor Jacques de Vaucanson presented the Academy of Sciences in Paris with two musical automatons (a mechanical flute player and a tambourine player) and one decidedly not (a gilded copper duck capable of quacking, snacking and — let’s just say — digesting).

Then we went through our industrial phase: The calliope appeared atop a plume of pressurized steam in the early 1850s. And horologists — i.e. clockmakers — oversaw a subsequent century of innovations in home music boxes, roller organs, player pianos and other clunky crank-powered, disc-driven or pneumatic musical diversions: symphonions, polyphons, orchestrions, to name a few.

Then we got futuristic: The advent of analog synthesis and sequences in the mid-20th century ushered in a wave of automated and programmable music. These include the early tape-loop experiments of such artists as Éliane Radigue, Terry Riley, Karlheinz Stockhausen and especially Brian Eno, whose self-generating music techniques on his 1975 album “Discreet Music” helped define the ambient music genre.

But the last real paradigm shift arrived in 1983, when the first five-pin cable was used to transmit MIDI — musical instrument digital interface — a digital musical language that revolutionized the way music is conceptualized, composed, recorded and performed.

The early ’80s also saw the first experiments in computer-assisted composition. Composer David Cope embarked upon his Experiments in Music Intelligence in 1981 to battle a bout of composer’s block and ended up creating algorithmic evocations of various classical composers that, according to his website, “delighted, angered, provoked, and terrified those who have heard them.” Cope’s work is getting a fresh listen via Jae Shim’s 2022 documentary “Opus Cope: An Algorithmic Opera” (which also helped inform Machover’s development of Hyperscore.)

Reporter Danielle Abril tests columnist Geoffrey A. Fowler to see if he can tell the difference between an email written by her or ChatGPT. (Video: Monica Rodman/The Washington Post)

Today’s AI-powered music systems operate like large-scale expansions of Cope’s work — employing machine learning to sift through massive troves of data to discern patterns, textures and complexities. Like the image and video generators popular online, these systems can respond to prompts and generate (pardon the scare quotes) “original music.”

But like those other generators, the “output” of these systems can resemble music in certain ways and diverge in ways I can only describe as the musical equivalent of a seventh finger or an extra row of eerily smiling teeth.

“One problem is that these systems don’t know much about music,” Machover says. “The second is that we can describe a picture pretty well. We can use something that’s purely text. But what does it mean to describe music in words?”

OpenAI — the company behind the overwhelmingly popular, rapidly developing and deeply concerning AI engines ChatGPT and DALL-E 2 — has recently introduced Jukebox, a music-generating neural net that can produce music from scratch once “trained” or “conditioned” with various inputs: audio, genre, even lyrics (the 21st-century equivalent of “maybe if you hum a few bars”).

This means that 12 seconds of a Stevie Wonder hit can “condition” the system to complete the song the way it supposes it must go. If you’ve ever listened to someone who has no memory for a tune and no knack for humming, you have an idea of how this might sound.

There’s also MusicLM, an in-process product from a Google research team that the company says can generate “high-fidelity music from text descriptions.” These can be simple — “a calming violin melody backed by a distorted guitar riff” — or relatively complex — “Slow tempo, bass-and-drums-led reggae song. Sustained electric guitar. High-pitched bongos with ringing tones. Vocals are relaxed with a laid-back feel, very expressive.” The music is aggressively nondescript, but most of it could fit in perfectly well at a party without making too much of a scene.

But as close as some of the sounds come to musical ideas I’d willingly listen to, nothing I’ve heard from the robots approaches anything resembling a bop.

“The more you make something synthetic, the more it tries to resemble something real, the more uncomfortable it feels,” Machover says of the “uncanny valley” vibe generated by this robotroid music. “The closer it is to human, the more I can tell the difference and the more I’m like, ‘Yuck!’ It really bothers me.”

Perhaps this is why the music I’ve heard from AI generators leaves me feeling so cold and creeped out. The absence of a human hand — made glaring and discomforting through the tiniest lapses of approximated intuition — can render the listening experience a valueless exchange, the music a bunk currency.

When we know there’s a person on the other side of a song, phrase or gesture, there remains the sense of a connection, an understanding capable of collapsing time and distance — even if that person has been dead for centuries, even if that phrase exists only as a scratch on paper or wax.

But there’s no message in AI’s proverbial bottle. AI has no artistic intention, no creative spirit, nothing to get off the chest it doesn’t have. Any emotional response you might have is 100 percent projection, and it doesn’t feel great. For the moment, AI remains a model refining itself, not an artist reaching out to you. Its goal is optimal performance, not ecstatic release.

“Pieces of music aren’t just pieces of sound,” Machover says. “They’re because some human being thought something was important to communicate and express.”

On the other hand — and to hack a phrase from Aretha Franklin — systems are doing it for themselves.

Tod Machover’s “Overstory Overture” premieres March 7 at Alice Tully Hall, 1941 Broadway, New York. lincolncenter.org.