Holly Herndon is forging ways to emote and sing with an AI
The experimental musician on creating an AI collaborator, and how having a foot in both academia and mainstream music has shaped her idea of what pop music is.
Image: Boris Camaca
Listen to Holly Herndon’s 2012 EP, Movement, and you might file it under ‘experimental’. When the electronic avant-garde musician wrote it, she was an academic by day, a laptop performer by night and programming her own processors and instruments along the way.
Fast-forward to 2019, and for her third full-length studio album, PROTO, Herndon is exploring new musical worlds opened up by AI. To do that, she’s created her own machine intelligence called Spawn that can reinterpret and resythesise voices, and it lives in a high-spec gaming computer. We sit down with the Tennessean electronic artist to discover the impact that working with an AI has on the compositional process and why she thinks her latest record is pop music, albeit a warped version of it.
On the reasoning behind developing an AI, Herndon explains: “I’m interested in emerging technology and how it affects our daily life. In order to understand its capability and to have a fully formed opinion on it, I like to get my hands dirty and understand what it is.
“It’s something that’s been in the air for the last couple of years,” she says. It all starts with what’s coming out of the research institutes, she explains. “A new program or a new kind of architecture will be released, and then all the nerds scramble to figure out how to make it interesting. It was something that I was curious to deal with, but just never had the chance.”
Herndon’s aspirations to work on a machine-learning project might never have come to fruition had it not been for a grant from the German government. Berlin-based Herndon and collaborator Mat Dryhurst were funded by BeBeethoven, a project inspired by the prolific composer, exploring the big questions in music creation.
Although Herndon holds a PhD in composition from Stanford University and is no stranger to programming, the BeBeethoven grant allowed her to enlist developer and musician Jules LaPlace.
“Jules should be given the credit for reading the technical papers and installing the software, and then us having conversations about which architecture we wanted to use,” she explains. “I focused, this time, more on the development of the training sets with Mat. There are only so many methods that are available out there right now. It’s more about how you tweak the parameters and the most creativity for me comes with the creation of the training sets – because that is actually a performative thing. And that, of course, has such a huge impact on what it is that you come up with. That’s where I felt like I was the most useful.”
Holly and the machine
In the field of artificial intelligence in music-making, there are two main branches of research. As Herndon outlines: “The most popular approach is using MIDI data to create automatic [musical] scores. That’s something we did not want to do. We found that to be a little bit boring. We wanted to deal with audio material and sound generation instead.”
That began with the development of training data sets. “We had three main approaches,” says Herndon, “We started training it on my voice and Mat’s voice, and some little Foley sounds around the house. Then we opened up that training to our [vocal] ensemble. And then we opened it up again with the public.”
The public event, called Deep Belief, held in 2018, incorporated an interactive theatre performance where Herndon’s ensemble led the public through the creation of training sets. This involved reciting text, emoting and producing other sounds together. These were recorded and turned into more training data for the AI to understand and interpret.
The point of all of this was not to create some kind of virtual instrument that could replicate the sound of Herndon’s voice based on MIDI input. The goal was to create an AI that was a musical collaborator.
“We’re trying to view Spawn as an ensemble member,” she explains. “We have material that we would like to be performed or interpreted through both human performers and AI performers. So we see it in that paradigm, rather than Spawn in a composer role. She’s performing things through the voice of me, the ensemble or the public. That’s what we’re going for,” says Herndon.
This amounts, she says, to being “an ensemble made up of human and machine intelligence, and we’re finding a way to emote and perform and sing together”.
To move towards that end goal, Holly and her collaborators tried a couple of techniques. One of these involved a method called SampleRNN, which is described by its developers as “an unconditional end-to-end neural audio-generation model”.
“The way that we used SampleRNN was that we trained it on my voice, specifically. It’s almost a microsound technique,” says Herndon. “In granular synthesis, an audio file is spliced into grains that are rearranged, so this is like the next version of that. The computer is splicing up the canon (the training set) into grains and learning which grain usually follows the previous grain. What are the qualities of the one grain that normally follow the next? So, learning what that is, and carving that out of nothing, so there’s no sampled material. It’s just learning what those grains look like and recreating them,” she explains.
Sound of the future
Granular is an appropriate term to describe some of the sounds Spawn makes on Herndon’s PROTO. The AI’s interpretation of the human voice ranges from glitchy sample repeats and stammering speech synthesis through to stretched, alien chorus.
“On Birth and Godmother, that’s all Spawn,” says Herndon. “On SWIM, you can hear it more, where it sounds like its backing vocals, but it’s really Spawn interpreting backing vocals. On some tracks, it’s more clear than on others. On Fear, Uncertainty, Doubt, there’s no Spawn on there at all.”
The fact that Spawn did not work in real-time influenced the compositional process, necessitating an asynchronous approach in the writing sessions for PROTO. “There was a lot of call and response happening,” recalls Herndon. “Sometimes, the ensemble was responding to digital process, sometimes it was responding to something that Spawn did, sometimes Spawn was responding to something that they did. It was very back and forth.”
That made for an “iterative” process, according to Herndon. That’s to say, there weren’t complete performances planned from the outset. “It’s not like I sat down and wrote sheet music and trained things and then we all performed together,” she says. Although she did write some phrases as notes on paper for the ensemble to perform, these performances would be recorded, processed and sometimes played back to then be emulated by the vocal group.
The critical element of the writing style was the interpretation of each collaborator, human and artificial. “That whole process of human interpretation with human error and AI interpretation all gives you something back that you’re responding to. It’s changing the composition. That’s the writing process. It’s different for every song. There was no template, but it was very iterative.”
The amount of change that was made after the recording sessions surprised some of those involved. Herndon remembers playing the final album back to the vocal ensemble for the first time: “They were like, “What? This sounds totally different from what we recorded in the studio and what we’ve been rehearsing,” she says, laughing.
“It mutates,” she tells us. And this isn’t its final mutation either,” she reveals: “It will mutate from recording to live experience for sure. There will be a different mutation, absolutely.”
Beyond the vocal, Herndon called on retro orchestral stabs and big drum sounds for the first single from PROTO, Eternal. “I was going for ecstasy, ultimate relief and grandeur,” says Herndon, who enlisted award-winning mix engineer Marta Salogni to bring the production together.
“We were joking in the studio that it was our Baywatch drum moment,” she recalls. “She’s rad. Her approach is really cool, and I feel that she almost has a prog aesthetic. She might kill me for saying that. I feel like that was how she was mixing this project. Giving the drums a lot of space and allowing them to be bombastic and wide and then having these dramatic orchestral, Fairlight stabs,” she says.
“We layered the Fairlight stabs with some orchestral stabs that went through Spawn as well. There’s often a double layer. It’s rarely one sound occupying space. We often try to layer things and then carve in and out, so the sound is evolving over time,” says Herndon.
This is a technique she attributes to computer-music pioneer John Bischoff, who was one of her tutors at Mills College in Oakland, where she studied electronic music. “He’s a really cool weirdo. He started one of the first computer bands called The Hub where they were basically networking KIM-1 computers and pinging each other to aestheticise the sound of the network.
“One of the things he told me is that if you’re gonna have repetition in your music, just make sure the sound is always evolving. LFOs and moving envelopes are your friends, and that’s something I tried to take away from that – to have the sounds evolving, because the ear can get tired,” she recounts.
Once a prominent advocate for using visual-programming software Max/MSP, Herndon has changed focus for PROTO, as she tells us: “I’m not programming that much these days, because I have things that I’ve already built that I’m reusing a lot. There was more of a focus on songwriting on this album than there has been in the past. But of course, it’s about building your toolkit and I have this core toolkit that I’m using always evolving, and Max is a huge part of that. I love exploring the shared patches that people upload online, and there are some smaller developers that I always revisit.”
Though her full list of go-to signal generators and processors is a tightly guarded secret, there’s one brand that she’s happy to shout out. “There’s this awesome guy called Alessio, he does K-Devices. He does a lot of stuff with step sequencers to help people move out of grid-like rhythms. I really like it when things have a swing to them or when things are less on the grid or have a more organic drum pattern. He’s also done some really cool granular patches.”
On PROTO, Herndon manages to walk a fine line between the experimental and the familiar and attributes this to her background. “Well, I feel like I have this kinda fucked-up idea of what pop music is,” she explains. “I feel like the music I make is really pop. I spend a lot of time in more niche communities, and academic communities, where what I do is crazy pop.”
Outside of academia, it’s a different story, as Herndon relates: “Then I play it to someone I’m working at the label. And I’m like, ‘Isn’t this crazy pop?’ And they’re like, ‘This is not pop music. This is weird-ass niche music.’ So, I think that I have a warped view of what it is.”
Her advice for nascent experimental creators is to understand where you come from. She says: “There’s a fine line between having reverence for and learning from your forefathers and foremothers and knowing your history, so you know what you’re repeating and what’s new and responsive to today. Also, being able to kill your idols and do your own thing is very important – to know where you stand on those kinds of things – so you’re not repeating or rehashing.
“‘Experimental’ isn’t necessarily a section of the record store or one aesthetic. It should always be changing and shifting,” Herndon says. “So experimental to an 18-year-old in Cardiff should be different than a 30-year in San Diego. It should be responding to the reality around you.”
The unknown future
Looking to the future of AI technology in music, there are two paths, Herndon says: “I see a lot of idiomatic Muzak. That’s not interesting. And I think we’re gonna see some people that will learn from this intelligence and create new, weird shit. I think we’re going to see both.”
On the future public availability of her own creation, Spawn, Herndon points out that there’s nothing currently stopping anyone from creating their own, individual AI performer. “Most of these things are already open to the public. I haven’t released my specific training set or our own unique tweaks. But most of this stuff is on GitHub. I didn’t write SampleRNN. This is something researchers came up with. Each paper has about 10 names attached to it, and I should actually be naming all of these people.”
She has considered the consequences of releasing Spawn to the public. “We’ve thought about opening up the training to the public, but I’m really into this idea of people being acknowledged or named. And if you open it up, in an anonymous way, that gets out of control. We’ve done some of these public training performances where everybody knew what they were getting into, but we’re not quite ready to open Spawn up to the wider community yet. Maybe one day.”
Get the latest news, reviews and tutorials to your inbox.Subscribe