Logo Sony

DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis with Autoencoding GANs

Javier Nistal, Cyran Aouameur, Ithan Velarde, Stefan Lattner

Companion Website

This page contains supplementary material for the paper DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis with Autoencoding GANsMachine Learning for Audio Synthesis (MLAS) workshop at ICML 2022. DrumGAN VST is a simple and intuitive plugin for drum sound synthesis employing Generative Adversarial Networks (GANs) and inspired by previous work [1].

DrumGAN VST offers the following key features:

  1. 44.1 kHz sample-rate audio operability
  2. Continuous instrument control over kick, snare, and cymbals: choose the amount of “kickness”, “snareness”, and “cymbalness” you want to confere to the synthesized sound. This control can also be used to create hymbrid sounds that morph characteristics from each instrument class, or to explore weird sounds by setting these parameters to unrealistic combinations (e.g., all to zero).
  3. Analysis/resynthesis AKA encoding/decoding: DrumGAN VST features an encoding neural network that can be used to analysize/encode some pre-existing sound and decode/resynthesize variations of it.
  4. The DrumGAN technology is available as an integrated feature in Steinberg’s Backbone 1.5.

In what follows, we showcase the aforementioned capabilities of DrumGAN VST by providing some audio and musical examples. We also show a demo of the technology integrated into Steinberg’s Backbone.

Music Examples

Beats produced with DrumGAN-generated sounds

Producer Carli Nistal used an early version of the prototype and produced these two beats. 

Beat 1 Beat 2

Azzaro advertisement campaign

DrumGAN was used to design most drum sounds in this track.

The A.I. Drum Kit

« The A.I. Drum Kit » is an collection of drums generated with DrumGAN and processed by French producer Twenty9. It consists of 18 808s, 20 kicks, 29 snares, 15 claps, 8 rimshots, 15 hi-hats, 8 open hats, and 12 percs. The goal was to blend DrumGAN’s ability to generate drums from scratch with the taste and experience of a talented producer.

After a few months of experimenting with DrumGAN, Twenty9 releases his Drumkit made out of AI-generated sounds

Converting beatbox to drums

We used an onset detector on a recoded beatbox performance to automatically split the file into individual sounds, that then get encoded/decoded. In the end, were obtain the reconstruction of the original file with DrumGAN-synthesized sounds. Beatbox by CJ Carr

Original Original + Decoded
Baseline Comparison

We compare DrumGAN VST generations with real samples and two other neural drum synthesizers as baselines: one is CRASH [2], based on diffusion models, and the other is Style-DrumSynth [3], based on StyleGAN. Overall, the baselines have high bias and fail to generate diverse examples in terms of timbre and exhibit annoying artifacts compared to real data, while DrumGAN produces high quality samples similar to the real data. Samples were randomly selected to fairly reflect the diversity and quality of samples from each model. Quantitative comparisons can be found in the paper.

Real Data DrumGAN VST
CRASH [2] Style-DrumSynth [3]

Instrument Control & Interpolations

We show interpolations for DrumGAN VST. Inital and target timbres are chosen from generated samples. DrumGAN VST is explicitly conditioned on a global class probability latent vector, therefore, interpolations sound like reasonable instruments across all classes.

Inter-class Interpolations

Cymbals to Kicks
Kicks to Snares
Cymbals to Snares

Intra-class Interpolations

Kicks
Snares
Cymbals

Analysis/Synthesis of prexisting sounds

We compare encoded and reconstructed pairs of audio examples for DrumGAN VST and Style-DrumSynth [3], which also incorporates an encoder analogous to ours. Encoded sounds are chosen from the training distribution, as well as from unseen percussion sounds. We can hear that DrumGAN VST’s synthesized examples are generally perceived closer in terms of timbre to the original encoded sample, especially in the case of unseen data, for which Style-DrumSynth seems to generate the same example. We argue that the baseline’s encoder could be overfitting the training data and fails to generalize to unseen examples.

Data seen during training

DrumGAN VST Style-DrumSynth [3]

Data not seen during training

DrumGAN VST Style-DrumSynth [3]
Unseen Kicks
Unseen Snares
Unseen Cymbals
Unseen Other

Integration into Steinberg’s Backbone 1.5

Steinberg’s Backbone is a virtual instrument which provides new ways to design primarily intricate drum sounds for every style of music. Users start off by layering up to eight samples, which can be split into individual tonal or noise elements. The individual layers can also be resynthesized with easily accessible manipulation tools through the clearly laid out user interface.

Backbone previously required users to start with a sample from their library. In that perspective, the addition of DrumGAN takes Backbone to a new level of individuality for designing original sounds.

Steinberg announces collaboration with Sony Computer Science Laboratories – Paris, with its AI-driven DrumGAN development incorporated into the latest release of Backbone.
References
[1] Nistal, J., Lattner, S., and Richard, G. DrumGAN: Synthesis of Drum Sounds With Timbral Feature Conditioning Using Generative Adversarial Networks. In Proc. of the 21st International Society for Music Information Retrieval, ISMIR 2020.
 
[2] Rouard, S. and Hadjeres, G. CRASH: Raw audio score-based generative modelling for controllable high-resolution drum sound synthesis. In Proc. of the 22nd International Society for Music Information Retrieval Conference, ISMIR 2021.
 
[3] Drysdale, J., Tomczak, M., and Hockman, J. Style-based drum synthesis with gan inversion. In Extended Abstracts for the Late-Breaking Demo Sessions of the 22nd International Society for Music Information Retrieval Conference, ISMIR 2021.