The hair, the clothes, and in particular the music.
Even if you knew nothing about musical instruments in general, and synthesizers in particular, I could play you a few beats of a number of popular songs from the era, and you would probably go: ”Oh, I remember that sound”.
Whether it was the electric piano in almost any ballad at the time, Howard Jones’ ”What is love”-bass, or the ominous bells of the Top Gun Anthem, they were all generated by the same synthesizer, or at least one of its contemporary siblings – The Yamaha DX7.
In a world where electronic music was dominated by analog instruments, the DX7 offered an entirely fresh palette of crisp, clangorous sounds yet unheard of (at least unless you happened to be a regular in Quincy Jones’ studio) for a reasonable price. So reasonable in fact, that my best friend withdrew all of his savings from a few summer jobs and bought a DX7 of his own.
We spent countless hours figuring out how to make sounds slightly reminiscent of clarinets, bells or empty bottles being struck, and eventually we wrote our final high school report on the topic of how FM synthesis works (thanks to the legendary book Yamaha DX7 Digital Synthesizer by Yasuhiko Fukuda) and correlating how parameters magically could create square-ish waveforms, through numerous late nights peering over an oscilloscope our school generously offered on loan, together with a painfully slow program written in Pascal on an IBM PC to mimic the observed waveform given the corresponding DX7 patch parameters.
We proudly named the report ”Frequency Modulation in Theory and Practice”, scored an A, and somewhat reluctantly returned the oscilloscope. Then, life carried us in different directions, and the allure of college life soon made us forget all about it.
This spring, I trained an AI system to program sounds for the Yamaha DX7. In the coming article series I will share the story of how ”Frequency Modulation in Theory and Practice” continued, some thirty-five years later, with surprising results.
The approach I used was to train not just one neural network, but two, in a configuration known as a GAN – a Generative Adversarial Network. In practice, this is an architecture for training deep learning models that has become popular through image generation. It consists of two neural nets competing, where one tries to outsmart the other.
In our case, one of the nets, called the generator gradually learns how to create parameters for a valid DX7 patch, while the other, called the discriminator learns how to discern reasonable patches (that actually generate a playable sound, once loaded into a DX7) from unreasonable ones (that does not make sense to the DX7, or result in sounds that are neither playable nor interesting)
By connecting the generator and discriminator in a feedback loop, we have a GAN, where the generator tries to trick the discriminator by producing patch data of better and better quality, while the discriminator is presented alternately with real patches (made and categorized by humans) and “fake” patches created by the generator. If all goes well, the generator will ultimately cause the discriminator to be unable to differentiate between patches created by man, or algorithm.
Well, that is what is supposed to happen. But as it turned out, there were numerous potential pitfalls along the way.
Here be dragons
The first, and perhaps most upsetting (to me, at least) obstacle, was that someone else did it first.
Yes, I thought I had come up with a really clever idea, getting an AI to breathe new life into one of the most notoriously difficult synthesizers to program, yet with a cult following, and to explore the boundaries of sound design and see what a non-human approach could come up with. So, before I set out on my own journey, I googled it. And found the site This DX7 Cartridge Does Not Exist. Which actually uses a trained AI to generate “cartridges”, or rather banks of 32 sounds for free.
Apparently, someone had beaten me to it. Somewhat deflated, I downloaded a few sound banks, tried them out in my DX7, and then shelved my idea for a few months. And then I thought: How can I do this better?
I don’t know how the author of the DX7 Cartridge We Would All Have Wanted Back In The Day approached the problem, as I haven’t studied the source code. But what I did find was that out of every generated cartridge – one “cartridge”, or rather, bank consists of 32 sound patches – only a few sounds seemed useful to me, the rest consisting of nasty squawks and harsh noise – but I wanted something that could create a bank of a lot of useful sounds – and I wanted to be able to create sounds out of a given category.
Another challenge is the fact that GANs are notoriously difficult to train. In fact, they usually don’t converge to a minimum loss, or maximum quality, but rather stabilize at a certain optimal tradeoff-level which happens less often that you would hope.
Then, there was the daunting task of classifying the training data. Fortunately, the internet has collected patches created by clever DX7 programmers from the past 40 years (!) and as it turned out, there was no shortage of quality sounds to use as a training ground. But what they don’t tell you, is that data has to be categorized in order to be able to train any deep network, at least if you want any results to write home about, and unless you’re Google and can recruit millions of people on the web to do it for you (for free), you have to do it yourself. This actually turned out to be quite a huge undertaking, as we will see later.
Finally, it all had to be coded, and tested. And how do you test if sounds are useful and sweet-sounding? By ear, of course. By the end of this journey I had to refrain from listening to FM sounds for a couple of months to avoid developing a sonic allergy to them! But I have since recuperated, and quite enjoy the tool I now have to create not just sounds from a given category, but actually combination of categories and also variations of any favorite sound!
Through this series of articles, I will share the details of my journey, and hope to inspire some of you to try it out for yourselves. I also hope to get ideas and inspiration on how to further develop Deep DX into an even more useful tool for sound creation.