**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Fractal additive synthesis

Résumé

Musical and audio signals in general form a major part of the large amount of data exchange taking place in our information-based society. Transmission of high quality audio signals through narrow-band channels, such as the Internet, requires refined methods for modeling and coding sound. The first important step is the development of new analysis techniques able to discriminate between sound components according to effective perceptual criteria. Our ultimate goal is to develop an optimal representation in a psychoacoustical sense, providing minimum rate and minimum "perceptual distortion" at the same time. One of the most challenging aspects of this task is the definition of a good model for the representation of the different components of sound. Musical and speech signals contain both deterministic and stochastic components. In voiced sounds the deterministic part provides the pitch and the global timbre: it is in a sense the fundamental structure of a sound and can be easily represented by means of a very restricted set of parameters. The stochastic part contains what we might call the "life of a sound", that is an ensemble of microfluctuations with respect to an electronic-like/non-evolving sound as well as noise due to the physical excitation system. The reproduction of the latter is of fundamental importance to perceive a sound like a natural one. We faced this challenge by developing a new sound analysis/synthesis method called Fractal Additive Synthesis (FAS). The first step was the definition of a new class of wavelet transforms, namely the Harmonic-Band Wavelet Transform (HBWT). This transform is based on a cascade of Modified Discrete Cosine Transform (MDCT) and Wavelet Transforms (WT). By means of the HBWT, we are able to separate the stochastic from the deterministic components of sound and to treat them separately. The second step was the definition of a model for the stochastic components. The spectra of voiced musical sound have non-zero energy in the sidebands of the spectral peaks. These sidebands contain information relative to the stochastic components. The effect of these components is that the waveform of what we call a pseudo-periodic signal, i.e. the stationary part of voiced sounds, changes slightly from period to period. Our work is based on the experimentally verified assumption that the energy distribution of a sideband of a voiced sound spectrum is mostly shaped like powers of the inverse of the distance from the closest partial. The power spectrum of these pseudo-periodic processes is then modeled by means of a superposition of modulated 1/f components, i.e., by means of what we define as a pseudo-periodic 1/f-like process. The time-scale character of the wavelet transform is well adapted to the selfsimilar behavior of 1/f processes. The wavelet analysis of 1/f noise yields a set of very loosely correlated coefficients that in first approximation can be well modeled by white noise in the synthesis. The fractal properties of the 1/f noise also motivated our choice of the name Fractal Additive Synthesis. The next step was the definition of a model for the deterministic components of voiced sounds, consistent with the HBWT analysis/synthesis method. The model is from some point of view inspired by the sinusoidal models. The two models provide a complete method for the analysis and resynthesis of voiced sounds in the perspective of structured audio (SA) sound representations. For the stationary part of voiced sounds compression, ratios in the range of 10-15:1 are easily achievable. Even better results in terms of data compression can be obtained by taking psychoacoustic criteria into consideration. A psychoacoustic based selection of perceptually relevant parameters was implemented and tested. Compression ratios of 20-30:1, depending on the musical instrument, were achieved. An extension of the method based on a pitch-synchronous version of the HBWT with perfect reconstruction time-varying cosine-modulated filter banks was also studied. This makes the method able to handle, for instance, the slight pitch deviations or the vibrato of a musical tone or more relevant changes of pitch as in a glissando. Finally, the method has been successfully extended to non-harmonic sounds by the introduction and definition of an optimization procedure for the design of non-perfect reconstruction cosine-modulated filter banks with inharmonic band subdivisions. These extensions make FAS more flexible and suitable to analyze, encode, process and resynthesize a large class of musical sounds. The final result of this work is the development of a method for modeling in a flexible way both the stochastic and the deterministic parts of sounds at a very refined perceptual level and with a minimum amount of parameters controlling the synthesis process. In the context of SA the method provides a sound analysis/synthesis tool able to encode and to resynthesize sounds at low rate, while maintaining their natural timbre dynamics for high quality reproduction.

Source officielle

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

MOOCs associés (6)

Concepts associés (42)

Publications associées (167)

Digital Signal Processing [retired]

The course provides a comprehensive overview of digital signal processing theory, covering discrete time, Fourier analysis, filter design, sampling, interpolation and quantization; it also includes a

Digital Signal Processing

Digital Signal Processing is the branch of engineering that, in the space of just a few decades, has enabled unprecedented levels of interpersonal communication and of on-demand entertainment. By rewo

Digital Signal Processing I

Basic signal processing concepts, Fourier analysis and filters. This module can
be used as a starting point or a basic refresher in elementary DSP

thumb|Ondelette de Daubechies d'ordre 2. Une ondelette est une fonction à la base de la décomposition en ondelettes, décomposition similaire à la transformée de Fourier à court terme, utilisée dans le traitement du signal. Elle correspond à l'idée intuitive d'une fonction correspondant à une petite oscillation, d'où son nom. Cependant, elle comporte deux différences majeures avec la transformée de Fourier à court terme : elle peut mettre en œuvre une base différente, non forcément sinusoïdale ; il existe une relation entre la largeur de l'enveloppe et la fréquence des oscillations : on effectue ainsi une homothétie de l'ondelette, et non seulement de l'oscillation.

In numerical analysis and functional analysis, a discrete wavelet transform (DWT) is any wavelet transform for which the wavelets are discretely sampled. As with other wavelet transforms, a key advantage it has over Fourier transforms is temporal resolution: it captures both frequency and location information (location in time). Haar wavelet The first DWT was invented by Hungarian mathematician Alfréd Haar. For an input represented by a list of numbers, the Haar wavelet transform may be considered to pair up input values, storing the difference and passing the sum.

In mathematics, a wavelet series is a representation of a square-integrable (real- or complex-valued) function by a certain orthonormal series generated by a wavelet. This article provides a formal, mathematical definition of an orthonormal wavelet and of the integral wavelet transform. A function is called an orthonormal wavelet if it can be used to define a Hilbert basis, that is a complete orthonormal system, for the Hilbert space of square integrable functions.

David Atienza Alonso, Vincent Stadelmann, Tomas Teijeiro Campo, Jérôme Paul Rémy Thevenot, Christodoulos Kechris

Acoustical knee health assessment has long promised an alternative to clinically available medical imaging tools, but this modality has yet to be adopted in medical practice. The field is currently led by machine learning models processing acoustical featu ...

2024The usual explanation of the efficacy of wavelet-based methods hinges on the sparsity of many real-world objects in the wavelet domain. Yet, standard wavelet-shrinkage techniques for sparse reconstruction are not competitive in practice, one reason being t ...

Point cloud representation is a popular modality to code immersive 3D contents. Several solutions and standards have been recently proposed in order to efficiently compress the large volume of data that point clouds require, in order to make them feasible ...

2022