This website accompanies the DAFx 2024 Late-breaking submission "Synthesizer Sound Matching using Audio Spectrogram Transformers." Here, we provide audio demos of the in-domain out-of-domain performance of our Audio Spectrogram Transformer model for synthesizer sound matching on a 16 parameter dataset generating using the Massive synthesizer. In-domain examples consist of randomly generated Massive synthesizer one-shots from our test set, while out-of-domain examples include sounds from other synthesizers and vocal imitations of synth sounds. All samples are rendered with the matching pitch of the input sample.
Abstract
Systems for synthesizer sound matching, which automatically set the parameters of a synthesizer to emulate an input sound, have the potential to make the process of synthesizer programming faster and easier for novice and experienced musicians alike, whilst also affording new means of interaction with synthesizers. Considering the enormous variety of synthesizers in the marketplace, and the high level of complexity of many leading software synthesizers, general-purpose sound matching systems that function with minimal knowledge or prior assumptions about the underlying synthesis architecture are particularly desirable. With this in mind, we introduce a synthesizer sound matching model based on the Audio Spectrogram Transformer (AST). We demonstrate the viability of this model by training on a large synthetic dataset of randomly generated samples from the popular Massive synthesizer. We show that this model can reconstruct parameters of samples generated from a set of 16 parameters, highlighting its improved fidelity relative to multi-layer perceptron (MLP) and convolutional neural network (CNN) baselines. We also provide audio examples demonstrating the out-of-domain performance of this model in emulating vocal imitations, sounds from other synthesizers, and other musical instrumental sounds.
Audio Examples
In-domain:
Input: Random Massive preset | Output: Re-constructed Massive preset | |
Sample 1 | ||
Sample 2 | ||
Sample 3 | ||
Sample 4 | ||
Sample 5 | ||
Sample 6 | ||
Sample 7 | ||
Sample 8 | ||
Sample 9 | ||
Sample 10 | ||
Sample 11 | ||
Sample 12 |
Out-of-domain:
Input: | Output: | |
Sample 1 - Arp 5th | ||
Sample 2 - Cymbal | ||
Sample 3 - DX7 Bass | ||
Sample 4 - Soft Synth | ||
Sample 5 - Guitar | ||
Sample 6 - Laser | ||
Sample 7 - Metallic Synth | ||
Sample 8 - Sub bass | ||
Sample 9 - Vocal Imitation 1 | ||
Sample 10 - Vocal Imitation 2 | ||
Sample 11 - Vocal Imitation 3 | ||
Sample 12 - Vocal Imitation 4 |