Audio Novelty-Based Segmentation of Music Concerts

Hervé Lissek, Patrick Marmaroli, Dalia Salem Hassan Fahmy El Badawy
2013
Article de conférence

Résumé

The Swiss Federal Institute of Technology in Lausanne (EPFL) is in the process of digitizing an exceptional collection of audio and video recordings of the Montreux Jazz Festival (MJF) concerts. Since 1967, five thousand hours of both audio and video have been recorded with about 60% digitized so far. In order to make these archives easily manageable, ensure the correctness of the supplied metadata, and facilitate copyright management, one of the desired tasks is to know exactly how many songs are present in a given concert, and identify them individually, even in very problematic cases (such as medleys or long improvisational periods). However, due to the sheer amount of recordings to process, it is a quite cumbersome and time consuming task to have a person listen to each concert and identify every song. Consequently, it is essential to automate the process. To that end, this paper describes a strategy for automatically detecting the most important changes in an audio file of concert; for MJF concerts, those changes correspond to song transitions, interludes, or applause. The presented method belongs to the family of audio novelty-based segmentation methods. The general idea is to first divide a whole concert into short frames, each of a few milliseconds length, from which well-chosen audio features are extracted. Then, a similarity matrix is computed which provides information about the similarities between each pair of frames. Next, a kernel is correlated along the diagonal of the similarity matrix to determine the audio novelty scores. Finally, peak detection is used to find significant peaks in the scores which are suggestive of a change. The main advantage of such a method is that no training step is required as opposed to most of the classical segmentation algorithms. Additionally, relatively few audio features are needed which leads to a reduction in the amount of computation and run time. It is expected that such a preprocessing shall speed up the song identification process: instead of having to listen to hours of music, the algorithm will produce markings to indicate where to start listening. The presented method is evaluated using real concert recordings that have been segmented by hand; and its performance is compared to the state-of-the-art.

Source officielle

https://infoscience.epfl.ch/record/190844?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Audio Novelty-Based Segmentation of Music Concerts

Graph Chatbot

Chattez avec Graph Search

Acoustical Features as Knee Health Biomarkers: A Critical Analysis

Learnable filter-banks for CNN-based audio applications

Concurrent Evolution of Biomechanical and Physiological Parameters With Running-Induced Acute Fatigue

Acoustical Features as Knee Health Biomarkers: A Critical Analysis

Learnable filter-banks for CNN-based audio applications

Concurrent Evolution of Biomechanical and Physiological Parameters With Running-Induced Acute Fatigue