Publication Types:

Sort by year:

The Shape of RemiXXXes to Come: Audio Texture Synthesis with Time–frequency Scattering

Conference paper
Vincent Lostanlen, Florian Hecker
Proceedings of the International Conference on Digital Audio Effects (DAFx), 2019
Publication year: 2019

This article explains how to apply timefrequency scattering, a convolutional operator extracting modulations in the timefrequency domain at different rates and scales, to the re-synthesis and manipulation of audio textures.

Adaptive Time–Frequency Scattering for Periodic Modulation Recognition in Music Signals

Conference paper
Changhong Wang, Vincent Lostanlen, Emmanouil Benetos, Elaine Chew
Proceedings of the International Society on Music Information Retrieval (ISMIR) Conference
Publication year: 2019

Vibratos, tremolos, trills, and flutter-tongue are techniques frequently found in vocal and instrumental music. A common feature of these techniques is the periodic modulation in the time–frequency domain. We propose a representation based on time–frequency scattering to model the interclass variability for fine discrimination of these periodic modulations. Time–frequency scattering is an instance of the scattering transform, an approach for building invariant, stable, and informative signal representations. The proposed representation is calculated around the wavelet subband of maximal acoustic energy, rather than over all the wavelet bands. To demonstrate the feasibility of this approach, we build a system that computes the representation as input to a machine learning classifier. Whereas previously published datasets for playing technique analysis focus primarily on techniques recorded in isolation, for ecological validity, we create a new dataset to evaluate the system. The dataset, named CBF-periDB, contains full-length expert performances on the Chinese bamboo flute that have been thoroughly annotated by the players themselves. We report F-measures of 99% for flutter-tongue, 82% for trill, 69% for vibrato, and 51% for tremolo detection, and provide explanatory visualisations of scattering coefficients for each of these techniques.

BirdVox-full-night: A Dataset and Benchmark for Avian Flight Call Detection

Conference paper
Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, Juan Pablo Bello
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
Publication year: 2018

This article addresses the automatic detection of vocal, nocturnally migrating birds from a network of acoustic sensors. Thus far, owing to the lack of annotated continuous recordings, existing methods had been benchmarked in a binary classification setting (presence vs. absence). Instead, with the aim of comparing them in event detection, we release BirdVox-full-night, a dataset of 62 hours of audio comprising 35402 flight calls of nocturnally migrating birds, as recorded from 6 sensors. We find a large performance gap between energy-based detection functions and data-driven machine listening. The best model is a deep convolutional neural network trained with data augmentation. We correlate recall with the density of flight calls over time and frequency and identify the main causes of false alarm.

Deep Convolutional Networks on the Pitch Spiral for Musical Instrument Recognition

Conference paper
Vincent Lostanlen, Carmine-Emanuele Cella
Proceedings of the International Society on Music Information Retrieval (ISMIR) Conference
Publication year: 2016

Musical performance combines a wide range of pitches, nuances, and expressive techniques. Audio-based classification of musical instruments thus requires to build signal representations that are invariant to such transformations. This article investigates the construction of learned convolutional architectures for instrument recognition, given a limited amount of annotated training data. In this context, we benchmark three different weight sharing strategies for deep convolutional networks in the time-frequency domain: temporal kernels; time-frequency kernels; and a linear combination of time-frequency kernels which are one octave apart, akin to a Shepard pitch spiral. We provide an acoustical interpretation of these strategies within the source-filter framework of quasi-harmonic sounds with a fixed spectral envelope, which are archetypal of musical notes. The best classification accuracy is obtained by hybridizing all three convolutional layers into a single deep learning architecture.

Wavelet Scattering on the Pitch Spiral

Conference paper
Vincent Lostanlen, Stéphane Mallat
Proceedings of the International Conference on Digital Audio Effects (DAFx)
Publication year: 2015

We present a new representation of harmonic sounds that linearizes the dynamics of pitch and spectral envelope, while remaining stable to deformations in the timefrequency plane. It is an instance of the scattering transform, a generic operator which cascades wavelet convolutions and modulus nonlinearities. It is derived from the pitch spiral, in that convolutions are successively performed in time, log-frequency, and octave index. We give a closed-form approximation of spiral scattering coefficients for a nonstationary generalization of the harmonic sourcefilter model.