A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation

When designing an audio processing system, the target tasks often influence the choice of a data representation or transformation. Low-level time-frequency representations such as the short-time Fourier transform (STFT) are popular, because they offer a meaningful insight on sound properties for a low computational cost. Conversely, when higher level semantics, such as pitch, timbre or phoneme, are sought after, representations usually tend to enhance their discriminative characteristics, at the expense of their invertibility. They become so-called mid-level representations. In this paper, a source/filter signal model which provides a mid-level representation is proposed. This representation makes the pitch content of the signal as well as some timbre information available, hence keeping as much information from the raw data as possible. This model is successfully used within a main melody extraction system and a lead instrument/accompaniment separation system. Both frameworks obtained top results at several international evaluation campaigns.

A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation

Graph Chatbot

Chat with Graph Search

Stability of the Faber-Krahn inequality for the short-time Fourier transform

Impact of beam-coupling impedance on the Schottky spectrum of bunched beam

Verification of the Fourier-enhanced 3D finite element Poisson solver of the gyrokinetic full-f code PICLS

Impact of beam-coupling impedance on the Schottky spectrum of bunched beam

Stability of the Faber-Krahn inequality for the short-time Fourier transform

Verification of the Fourier-enhanced 3D finite element Poisson solver of the gyrokinetic full-f code PICLS