Publications related to Audio editing software

SVGC-AVA: 360-Degree Video Saliency Prediction With Spherical Vector-Based Graph Convolution and Audio-Visual Attention

Pascal Frossard, Chenglin Li, Li Wei, Qin Yang, Yuelei Li

Viewers of 360-degree videos are provided with both visual modality to characterize their surrounding views and audio modality to indicate the sound direction. Though both modalities are important for saliency prediction, little work has been done by joint ...

Ieee-Inst Electrical Electronics Engineers Inc2024

Concurrent Evolution of Biomechanical and Physiological Parameters With Running-Induced Acute Fatigue

Kamiar Aminian, Anisoara Ionescu, Salil Apte, Gaëlle Prigent, Vincent Gremeaux

Understanding the influence of running-induced acute fatigue on the homeostasis of the body is essential to mitigate the adverse effects and optimize positive adaptations to training. Fatigue is a multifactorial phenomenon, which influences biomechanical, ...

FRONTIERS MEDIA SA2022

Radical Intangibles: Materializing the Ephemeral

Sarah Irene Brutton Kenderdine, Lillian Hibberd, Jeffrey Shaw

New materialism considers that the world and its histories are produced by a range of material forces that extend from the physical and the biological to the psychological, social and cultural. In recognizing that heritage is not held in objects alone, new ...

2021

Multi-task Neural Network for Robust Multiple Speaker Embedding Extraction

Jean-Marc Odobez, Petr Motlicek, Weipeng He

This paper introduces a novel approach for extracting speaker embeddings from audio mixtures of multiple overlapping voices. This approach is based on a multi-task neural network. The network first extracts a latent feature for each direction. This feature ...

ISCA-INT SPEECH COMMUNICATION ASSOC2021

ASAP: a Dataset of Aligned Scores and Performances for Piano Transcription

Andrew Philip McLeod

In this paper we present Aligned Scores and Performances (ASAP): a new dataset of 222 digital musical scores aligned with 1068 performances (more than 92 hours) of Western classical piano music.The scores are provided as paired MusicXML files and quantized ...

2020

Design Patterns for Resource-Constrained Automated Deep-Learning Methods

Prakhar Gupta

We present an extensive evaluation of a wide variety of promising design patterns for automated deep-learning (AutoDL) methods, organized according to the problem categories of the 2019 AutoDL challenges, which set the task of optimizing both model accurac ...

2020

Deep Learning of Human Perception in Audio Event Classification

Samuel Denys Beuret, Yi Yu

In this paper, we introduce our recent studies on human perception in audio event classification. In particular, the pre-trained model VGGish is used as feature extractor to process audio data, and DenseNet is trained by and used as feature extractor for o ...

IEEE2018

System fusion and speaker linking for longitudinal diarization of TV shows

Hervé Bourlard, Petr Motlicek

Performing speaker diarization while uniquely identifying the speakers in a collection of audio recordings is a challenging task. Based on our previous work on speaker diarization and linking, we developed a system for diarizing longitudinal TV show data s ...

IEEE2016

Audio wave field encoding

Martin Vetterli

An encoder/decoder for multi-channel audio data, and in particular for audio reproduction through wave field synthesis. The encoder comprises a two-dimensional filter-bank to the multi-channel signal, in which the channel index is treated as an independent ...

2012

Learning dictionaries of spatial and temporal EEG primitives for brain-computer interfaces

José del Rocio Millán Ruiz, Ricardo Andres Chavarriaga Lozano, Benjamin Hamner

Sparse methods are widely used in image and audio processing for denoising and classification, but there have been few previous applications to neural signals for brain-computer interfaces (BCIs). We used the dictionary- learning algorithm K-SVD, coupled w ...

2011

Learning bimodal structure in audio-visual data

Pierre Vandergheynst, Gianluca Monaci

A novel model is presented to learn bimodally informative structures from audio-visual signals. The signal is represented as a sparse sum of audio- visual kernels. Each kernel is a bimodal function consisting of synchronous snippets of an audio waveform an ...

2009

Selecting relevant visual features for speechreading

Jean-Philippe Thiran, Mihai Gurban, Virginia Estellers Casas

A quantitative measure of relevance is proposed for the task of constructing visual feature sets which are at the same time relevant and compact. A feature's relevance is given by the amount of information that it contains about the problem, while compactn ...

2009

Blind Audiovisual Source Separation Using Sparse Representations

Pierre Vandergheynst, Gianluca Monaci, Anna Llagostera Casanovas

In this work we present a method to jointly separate active audio and visual structures on a given mixture. Blind Audiovisual Source Separation is achieved exploiting the coherence between a video signal and a one-microphone audio track. The efficient repr ...

2007

Multimodal Speaker Localization in a Probabilistic Framework

Jean-Philippe Thiran, Mihai Gurban

A multimodal probabilistic framework is proposed for the problem of finding the active speaker in a video sequence. We localize the current speaker's mouth in the image by using the video and the audio channels together. We propose a novel visual feature t ...

IEEE2006