**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Fractal additive synthesis

Abstract

Musical and audio signals in general form a major part of the large amount of data exchange taking place in our information-based society. Transmission of high quality audio signals through narrow-band channels, such as the Internet, requires refined methods for modeling and coding sound. The first important step is the development of new analysis techniques able to discriminate between sound components according to effective perceptual criteria. Our ultimate goal is to develop an optimal representation in a psychoacoustical sense, providing minimum rate and minimum "perceptual distortion" at the same time. One of the most challenging aspects of this task is the definition of a good model for the representation of the different components of sound. Musical and speech signals contain both deterministic and stochastic components. In voiced sounds the deterministic part provides the pitch and the global timbre: it is in a sense the fundamental structure of a sound and can be easily represented by means of a very restricted set of parameters. The stochastic part contains what we might call the "life of a sound", that is an ensemble of microfluctuations with respect to an electronic-like/non-evolving sound as well as noise due to the physical excitation system. The reproduction of the latter is of fundamental importance to perceive a sound like a natural one. We faced this challenge by developing a new sound analysis/synthesis method called Fractal Additive Synthesis (FAS). The first step was the definition of a new class of wavelet transforms, namely the Harmonic-Band Wavelet Transform (HBWT). This transform is based on a cascade of Modified Discrete Cosine Transform (MDCT) and Wavelet Transforms (WT). By means of the HBWT, we are able to separate the stochastic from the deterministic components of sound and to treat them separately. The second step was the definition of a model for the stochastic components. The spectra of voiced musical sound have non-zero energy in the sidebands of the spectral peaks. These sidebands contain information relative to the stochastic components. The effect of these components is that the waveform of what we call a pseudo-periodic signal, i.e. the stationary part of voiced sounds, changes slightly from period to period. Our work is based on the experimentally verified assumption that the energy distribution of a sideband of a voiced sound spectrum is mostly shaped like powers of the inverse of the distance from the closest partial. The power spectrum of these pseudo-periodic processes is then modeled by means of a superposition of modulated 1/f components, i.e., by means of what we define as a pseudo-periodic 1/f-like process. The time-scale character of the wavelet transform is well adapted to the selfsimilar behavior of 1/f processes. The wavelet analysis of 1/f noise yields a set of very loosely correlated coefficients that in first approximation can be well modeled by white noise in the synthesis. The fractal properties of the 1/f noise also motivated our choice of the name Fractal Additive Synthesis. The next step was the definition of a model for the deterministic components of voiced sounds, consistent with the HBWT analysis/synthesis method. The model is from some point of view inspired by the sinusoidal models. The two models provide a complete method for the analysis and resynthesis of voiced sounds in the perspective of structured audio (SA) sound representations. For the stationary part of voiced sounds compression, ratios in the range of 10-15:1 are easily achievable. Even better results in terms of data compression can be obtained by taking psychoacoustic criteria into consideration. A psychoacoustic based selection of perceptually relevant parameters was implemented and tested. Compression ratios of 20-30:1, depending on the musical instrument, were achieved. An extension of the method based on a pitch-synchronous version of the HBWT with perfect reconstruction time-varying cosine-modulated filter banks was also studied. This makes the method able to handle, for instance, the slight pitch deviations or the vibrato of a musical tone or more relevant changes of pitch as in a glissando. Finally, the method has been successfully extended to non-harmonic sounds by the introduction and definition of an optimization procedure for the design of non-perfect reconstruction cosine-modulated filter banks with inharmonic band subdivisions. These extensions make FAS more flexible and suitable to analyze, encode, process and resynthesize a large class of musical sounds. The final result of this work is the development of a method for modeling in a flexible way both the stochastic and the deterministic parts of sounds at a very refined perceptual level and with a minimum amount of parameters controlling the synthesis process. In the context of SA the method provides a sound analysis/synthesis tool able to encode and to resynthesize sounds at low rate, while maintaining their natural timbre dynamics for high quality reproduction.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts (13)

Related MOOCs (6)

Harmonic

A harmonic is a sinusoidal wave with a frequency that is a positive integer multiple of the fundamental frequency of a periodic signal. The fundamental frequency is also called the 1st harmonic, the other harmonics are known as higher harmonics. As all harmonics are periodic at the fundamental frequency, the sum of harmonics is also periodic at that frequency. The set of harmonics forms a harmonic series. The term is employed in various disciplines, including music, physics, acoustics, electronic power transmission, radio technology, and other fields.

Sound

In physics, sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the reception of such waves and their perception by the brain. Only acoustic waves that have frequencies lying between about 20 Hz and 20 kHz, the audio frequency range, elicit an auditory percept in humans. In air at atmospheric pressure, these represent sound waves with wavelengths of to . Sound waves above 20 kHz are known as ultrasound and are not audible to humans.

Stochastic

Stochastic (stəˈkæstɪk; ) refers to the property of being well described by a random probability distribution. Although stochasticity and randomness are distinct in that the former refers to a modeling approach and the latter refers to phenomena themselves, these two terms are often used synonymously. Furthermore, in probability theory, the formal concept of a stochastic process is also referred to as a random process.

Digital Signal Processing I

Basic signal processing concepts, Fourier analysis and filters. This module can
be used as a starting point or a basic refresher in elementary DSP

Digital Signal Processing II

Adaptive signal processing, A/D and D/A. This module provides the basic
tools for adaptive filtering and a solid mathematical framework for sampling and
quantization

Digital Signal Processing III

Advanced topics: this module covers real-time audio processing (with
examples on a hardware board), image processing and communication system design.