Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The recent advance of computer technology pushed the computing power of the modern computer systems to unprecedented levels; modern processors moved from the conventional, scalar type of architecture to more sophisticated, parallel ones, combining fast processing speed clock with multiple processing units. This jump of the computing power made possible the real-time implementations of many 3D audio algorithms that was hard to imagine 10-15 years before; on the other hand, the permanent increase of customer demand for real-time manipulation of the multimedia content leads to the development and implementation of more complex, and respectively more computationally demanding algorithms. The goal to deliver such sophisticated solutions to the market in a short time is already a difficult task; to achieve efficient solutions in such short terms, squeezing the maximum of the computing power of the currently available technologies, is even more challenging. In my thesis, I propose a solution to this problem by designing and implementing a framework that allows the developer to quickly design, implement and test custom 3D audio algorithms in an efficient way, with minimum efforts to port the already available code to another platform. The proposed approach started from a preliminary technology review and analysis in terms of development tools to find out and evaluate the state-of-the-art from 3D audio point of view. However, the performed analysis revealed some important drawbacks of these solutions in terms of fast reconfigurability for the implementation of custom 3D audio algorithms, and their capability for a precise synchronization and management of multiple stream media applications. First, it is in some cases necessary to develop completely new code to achieve the desired results, due to a lack of flexibility in the algorithm configurations or to unavailable functionality. Secondly, if the integration of different solutions and APIs, running at different hardware and software levels is pursued, this causes some problems with the synchronization between them, since high level APIs often introduce high latencies. The result is in some cases the system instability with consequent degraded quality, or even loss, of audio content for live and networked scenarios. In the end, it was noticed that if the APIs for 3D sound are higher level and developer-friendly they normally bring with them high latencies, sometimes-rigid 3D model solutions or high portability costs. If on the other hand they are lower-level solutions conceived for platform-independent low latency development and true real-time performance, the development of existing and particularly new 3D models may result long if good and robust source code is not available. To improve the situation, a new approach has been conceived to develop an independent middle-to-low level DSP library, which can provide a mid-layer development tool to easily design and configure new 3D algorithms. This simplifies at the same time the implementation task by providing the main building blocks in the field of signal processing and media streams, leaving to the developer enough flexibility and reconfigurability to implement the 3D audio models in a short time with the desired precision and quality. An object-oriented, two-level hierarchical structure has been adopted for the library. As a root of the hierarchy, some common properties for all primitives have been defined (e.g. number of input/output channels, single sample and block processing functions, etc.) and assigned to an object, common for all the DSP primitives (Parent Node). The first level (Atomic Level) contains the DSP primitives that cannot be built by the others: multiplier, adder, multiply-add, delay line, etc. Ideally, this layer corresponds to an instruction set and memory structures of a virtual machine designed for signal processing. The second level (Basic DSP Level) contains generic DSP primitives, e.g. low-pass filter, all-pass, nested all-pass, double nested all-pass, generic FIR and IIR filters, geometric and trigonometric operators (on vectors), FDN matrices, etc. This level is build by primitives, defined in the Atomic Level. Finally, as a test application, a 3D Audio layer has been implemented containing 3D algorithms, compatible first with the standardized VRML and MPEG-4 geometrical and perceptual room models, and in addition with several other algorithms like: Schroeder reverberator, Dattorro network reverberator, Gardner reverberator, etc. This allowed the successful integration of the proposed 3D audio library in a MPEG-4 compatible player, which has been implemented within the scope of the CARROUSO multimedia project. CARROUSO addressed the problem of how to transfer sound fields, generated at a certain real or virtual space, to another usually remote located space. The activities in CARROUSO did concentrate on the development and implementation of key components for recording, encoding, transmission and rendering, including hardware, software and algorithms. The transmission and coding was based on MPEG-4 and digital video broadcasting (DVB) technology, while sound recording and reproduction was based on wave field synthesis (WFS) technology. In this context, the MPEG-4 player is essential part of the chain, displaying the video content of the virtual scene, and providing the interface between the decoded sound content and the WFS rendering system. In contrast to the state-of-the art multi-channel sound systems, the new system has an enlarged listening area and is more flexible in respect to the number and the configuration of loudspeakers. The innovative work in CARROUSO found later its logical continuation in the MAP project where the developed 3D API has been exploited to integrate custom or standardized 3D audio models into a professional sound production toolbox, running on mixed general-purpose/FPGA/multimedia-DSP technology to obtain true real-time performance. In addition to the possibility to monitor locally multi-channel output, this tool provides at the same time the functionality to generate/encode object-oriented audio content, compatible to MPEG-4 BIFS format, which turns it into a sophisticated MPEG-4 authoring tool. As a conclusion, it could be noted, that the four years research and work during the CARROUSO and MAP projects, have proven in practice that the proposed 3D audio development framework is not only flexible and easily configurable, but it can be also very efficient, and ported to different platforms without putting significant effort in this task.
Joshua Alexander Harrison Klein
David Atienza Alonso, Alexandre Sébastien Julien Levisse, Miguel Peon Quiros, Simone Machetti, Pasquale Davide Schiavone