This page contains supplementary material for the paper DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis with Autoencoding GANs, Machine Learning for Audio Synthesis (MLAS) workshop at ICML 2022. DrumGAN VST is a simple and intuitive plugin for drum sound synthesis employing Generative Adversarial Networks (GANs) and inspired by previous work .
DrumGAN VST offers the following key features:
In what follows, we showcase the aforementioned capabilities of DrumGAN VST by providing some audio and musical examples. We also show a demo of the technology integrated into Steinberg’s Backbone.
Beats produced with DrumGAN-generated sounds
Producer Carli Nistal used an early version of the prototype and produced these two beats.
|Beat 1||Beat 2|
Azzaro advertisement campaign
DrumGAN was used to design most drum sounds in this track.
The A.I. Drum Kit
« The A.I. Drum Kit » is an collection of drums generated with DrumGAN and processed by French producer Twenty9. It consists of 18 808s, 20 kicks, 29 snares, 15 claps, 8 rimshots, 15 hi-hats, 8 open hats, and 12 percs. The goal was to blend DrumGAN’s ability to generate drums from scratch with the taste and experience of a talented producer.
Converting beatbox to drums
We used an onset detector on a recoded beatbox performance to automatically split the file into individual sounds, that then get encoded/decoded. In the end, were obtain the reconstruction of the original file with DrumGAN-synthesized sounds. Beatbox by CJ Carr
|Original||Original + Decoded|
We compare DrumGAN VST generations with real samples and two other neural drum synthesizers as baselines: one is CRASH , based on diffusion models, and the other is Style-DrumSynth , based on StyleGAN. Overall, the baselines have high bias and fail to generate diverse examples in terms of timbre and exhibit annoying artifacts compared to real data, while DrumGAN produces high quality samples similar to the real data. Samples were randomly selected to fairly reflect the diversity and quality of samples from each model. Quantitative comparisons can be found in the paper.
|Real Data||DrumGAN VST|
|CRASH ||Style-DrumSynth |
Instrument Control & Interpolations
We show interpolations for DrumGAN VST. Inital and target timbres are chosen from generated samples. DrumGAN VST is explicitly conditioned on a global class probability latent vector, therefore, interpolations sound like reasonable instruments across all classes.
|Cymbals to Kicks|
|Kicks to Snares|
|Cymbals to Snares|
Analysis/Synthesis of prexisting sounds
We compare encoded and reconstructed pairs of audio examples for DrumGAN VST and Style-DrumSynth , which also incorporates an encoder analogous to ours. Encoded sounds are chosen from the training distribution, as well as from unseen percussion sounds. We can hear that DrumGAN VST’s synthesized examples are generally perceived closer in terms of timbre to the original encoded sample, especially in the case of unseen data, for which Style-DrumSynth seems to generate the same example. We argue that the baseline’s encoder could be overfitting the training data and fails to generalize to unseen examples.
Data seen during training
|DrumGAN VST||Style-DrumSynth |
Data not seen during training
|DrumGAN VST||Style-DrumSynth |
Integration into Steinberg’s Backbone 1.5
Steinberg’s Backbone is a virtual instrument which provides new ways to design primarily intricate drum sounds for every style of music. Users start off by layering up to eight samples, which can be split into individual tonal or noise elements. The individual layers can also be resynthesized with easily accessible manipulation tools through the clearly laid out user interface.
Backbone previously required users to start with a sample from their library. In that perspective, the addition of DrumGAN takes Backbone to a new level of individuality for designing original sounds.