When you purchase through affiliate links on MusicTech.com, you may contribute to our site through commissions. Learn more

Why Wolfgang Gartner used custom-trained AI vocals on ‘Automatic’

The Grammy-nominated producer details his idiosyncratic process of using two different speech-to-speech resynthesis applications designed for voiceover work and video game audio.

by Rachel Narozniak

When you purchase through affiliate links on MusicTech.com, you may contribute to our site through commissions. Learn more

Wolfgang Gartner

One minute and 51 seconds into Automatic, a husky voice that evokes the rasp and gravitas of a 90s rap verse fills the sonic frame.

“Ya know, sometimes people ask me like, ‘How long did it take? How long do I have to work at it?’ And I’m not gonna lie, some people got it; some don’t.”

READ MORE: Fabiana Palladino is balancing perfectionism and knowing when to let go

The vocal texture is as gravelly as its delivery is authoritative. “To me, it sounds like Ol’ Dirty Bastard combined with The Notorious B.I.G,” Wolfgang Gartner muses.

Ol’ Dirty Bastard, The Notorious B.I.G, and the word “elderly.” At first blush, the outlier in this series is clear. Yet all three synergize in the context of Automatic — Gartner’s first drum ‘n’ bass production in more than 25 years, released via Dim Mak Records on February 9.

Adding to the single’s novelty is Gartner’s unusual approach to generating its rap verses. After writing and rapping them himself, Gartner morphed them into three custom-trained artificial intelligence (AI) voice models using two different speech-to-speech resynthesis applications designed for voiceover work and video game audio: Altered Studio and Respeecher.

That presence-commanding, spoken-word vocal that filters in just before the two-minute mark is a synthetic voice from Respeecher’s elderly category. It even has a name: Prospero. Each of the voices available in Respeecher’s portfolio, which currently spans 102 different options, is classified by name, age, pitch, and country of origin. Upon selecting a voice, users of the subscription service, purported by creators as a tool capable of producing “Hollywood-quality voices,” can adjust speech and pitch-shift to create the desired effect.

To achieve the resonant vocal at the centre of Automatic, Gartner transposed Prospero’s voice down eight semitones. First, though, he had to record himself speaking the lyrics, pronouncing each exactly as intended while carefully considering cadence and inflection. This step, Gartner imagines, is not unlike professional voice actors’ preparation for the cartoons they voice.

“I had to practise saying things how I wanted to…I was walking around my house practising voices all the time for a while,” he says. “I would never want to actually [use these voices in front of] another person or let them hear the original audio recordings that I uploaded because it’s embarrassing and because it’s like a cartoon voice…I’ve done this for female vocals as well and had to practice talking [and] rapping like a girl for days before the results sounded even close to good.”

After getting in character to “learn the voice” of this interior rap verse, down to the pronunciation of each word, Gartner recorded the vocals for this section “at least 15 or 20 times in a row.” He then stacked them and slowly worked his way through the audio, one word at a time. He followed this formula to engineer the two main verses that bookend Prospero’s. This time, the Grammy-nominated producer used Altered Studio, a voice content creation platform offering speech-to-speech voice morphing and voice cloning, among other capabilities.

“Every time I [heard] a word where I was like ‘yes, that’s how I wanted that word to sound,’ then I chopped it,” he recalls. “This is how I did it just to get the vocal to submit to the AI, and then I did it on the other side of [the AI] too. Literally every single word that I submitted as audio was chopped from one take just to get everything perfectly.”

That Gartner has both an eye and an ear for minutiae is evident in how he details the individual steps that, together, yield the three rap verses.

As he speaks, he embodies the essence of a mad scientist working solitarily, meticulously, and with a singular focus on an experiment that challenges convention and at which others will marvel. The creative process behind Automatic does just that, albeit with rap verses and beats rather than test tubes and beakers.

Candidly, Altered AI and Respeecher weren’t designed for this use case, and of course, there are far faster and more straightforward ways to source vocals. But Gartner had time. He’d dedicated two years to making an album — his awaited answer to 2016’s 10 Ways to Steal Home Plate — so he wasn’t touring as much as usual.

From the availability of a commodity that seems always to be in short supply for DJ/producers, coupled with Gartner’s creative drive to do something different, something that ventured beyond a commission from a singer-songwriter or a vocal from a sample pack, Automatic was born. It’s one of several singles to come from the would-be follow-up to 10 Ways to Steal Home Plate.

“I did make an album, but we’ve since decided to just sign [the tracks] individually as singles,” he says. The first, Level Up featuring UK MC Scrufizzer, landed last December via Dim Mak Records.

Gartner doesn’t mince his words. He tells me unequivocally, and with his whole chest, that creating and training the voice models for Automatic “took forever.” And although he wagers that he could conceptualise and fashion the rap verses in just 10 days now (the vocals alone took at least a month, from start to finish), he claims he is one of the few electronic producers who would voluntarily do so.

“Part of the reason why I love it is because I know that most people, once they find out what it would take to get what they want or to get what I get out of it…” he pauses, then adds, “I don’t know anybody who would be willing to do it.”

The question is: why do it?

In the complete creative control that these speech-to-speech resynthesis applications afford, Gartner finds the conceptual freedom he’s sought his entire career.

“This is the missing link, the thing that I had been looking for all my life. Because ever since I made my first song when I was 11 in 1993, my songs had lyrics. I always sang them, but the sound of my voice just isn’t cool,” Gartner recalls.

“There’s nothing you can do about that — you’re either born with great vocal chords or not. So, my whole career I’ve used vocalists to do verses for me. And then over the past five or six years, I started writing the entire vocal, singing a demo of it, and then having vocalists just re-sing it for me. That was what I always wanted to do, but most vocalists I think weren’t really into that.”

Still, he acknowledges that neither Altered Studio nor Respeecher is a seamless substitute. Because the applications, commonly used for voiceover work, were not developed to support the creation of vocals for a musical production, it’s understandably easier to generate a rap or spoken-word vocal than it is a more melodic, singing vocal. The latter would require significantly more editing than a rap vocal (akin to those on Automatic) already does.

These services were not built for this use case but in this unprecedented era of AI, generative and otherwise, Gartner is confident that in the not-so-distant future, one will be. Voice-Swap is already a promising solution for such a use case, albeit not yet perfect.

The advent of such platforms can be expected to come with both added convenience and legal considerations, but that’s a story — or, in Gartner’s case, a song — for another time. For now, in the Wild West of AI in music, Gartner has “developed a system and learned all the little quirks” of this technology in its current form, enabling him to leverage it more quickly and efficiently on future productions. Some might even say it’s, well, automatic.

Wolfgang Gartner

#AI #Artist/Producer/DJ #Electronic music #Vocals

Get the latest news, reviews and tutorials to your inbox.

Subscribe

Trending Now

1EastWest’s Hollywood Strings 2 is scary in all the right ways

2Meet Porij, the Coldplay-supporting new-rave band who are all about “dance music live”

3Read Steve Albini’s 1992 letter to Nirvana: “Paying a royalty to a producer is ethically indefensible…there’s no fucking way I would take that money.”

Why Wolfgang Gartner used custom-trained AI vocals on ‘Automatic’

Trending Now

1EastWest’s Hollywood Strings 2 is scary in all the right ways

2Meet Porij, the Coldplay-supporting new-rave band who are all about “dance music live”

3Read Steve Albini’s 1992 letter to Nirvana: “Paying a royalty to a producer is ethically indefensible…there’s no fucking way I would take that money.”

4How Jeekie and Raphaella produced and recorded ‘Change’