This website accompanies the DAFx 2024 Late-breaking submission "Synthesizer Sound Matching using Audio Spectrogram Transformers." Here, we provide audio demos of the in-domain out-of-domain performance of our Audio Spectrogram Transformer model for synthesizer sound matching on a 16 parameter dataset generating using the Massive synthesizer. In-domain examples consist of randomly generated Massive synthesizer one-shots from our test set, while out-of-domain examples include sounds from other synthesizers and vocal imitations of synth sounds. All samples are rendered with the matching pitch of the input sample.

Abstract

Systems for synthesizer sound matching, which automatically set the parameters of a synthesizer to emulate an input sound, have the potential to make the process of synthesizer programming faster and easier for novice and experienced musicians alike, whilst also affording new means of interaction with synthesizers. Considering the enormous variety of synthesizers in the marketplace, and the high level of complexity of many leading software synthesizers, general-purpose sound matching systems that function with minimal knowledge or prior assumptions about the underlying synthesis architecture are particularly desirable. With this in mind, we introduce a synthesizer sound matching model based on the Audio Spectrogram Transformer (AST). We demonstrate the viability of this model by training on a large synthetic dataset of randomly generated samples from the popular Massive synthesizer. We show that this model can reconstruct parameters of samples generated from a set of 16 parameters, highlighting its improved fidelity relative to multi-layer perceptron (MLP) and convolutional neural network (CNN) baselines. We also provide audio examples demonstrating the out-of-domain performance of this model in emulating vocal imitations, sounds from other synthesizers, and other musical instrumental sounds.

Audio Examples

In-domain:

	Input: Random Massive preset	Output: Re-constructed Massive preset
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
Sample 7
Sample 8
Sample 9
Sample 10
Sample 11
Sample 12

Out-of-domain:

	Input:	Output:
Sample 1 - Arp 5th
Sample 2 - Cymbal
Sample 3 - DX7 Bass
Sample 4 - Soft Synth
Sample 5 - Guitar
Sample 6 - Laser
Sample 7 - Metallic Synth
Sample 8 - Sub bass
Sample 9 - Vocal Imitation 1
Sample 10 - Vocal Imitation 2
Sample 11 - Vocal Imitation 3
Sample 12 - Vocal Imitation 4