**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Text as signal. A tutorial with case studies focusing on social media (Twitter)

Résumé

Sentiment analysis is the automated coding of emotions expressed in text. Sentiment analysis and other types of analyses focusing on the automatic coding of textual documents are increasingly popular in psychology and computer science. However, the potential of treating automatically coded text collected with regular sampling intervals as a signal is currently overlooked. We use the phrase "text as signal" to refer to the application of signal processing techniques to coded textual documents sampled with regularity. In order to illustrate the potential of treating text as signal, we introduce the reader to a variety of such techniques in a tutorial with two case studies in the realm of social media analysis. First, we apply finite response impulse filtering to emotion-coded tweets posted during the US Election Week of 2020 and discuss the visualization of the resulting variation in the filtered signal. We use changepoint detection to highlight the important changes in the emotional signals. Then we examine data interpolation, analysis of periodicity via the fast Fourier transform (FFT), and FFT filtering to personal value-coded tweets from November 2019 to October 2020 and link the variation in the filtered signal to some of the epoch-defining events occurring during this period. Finally, we use block bootstrapping to estimate the variability/uncertainty in the resulting filtered signals. After working through the tutorial, the readers will understand the basics of signal processing to analyze regularly sampled coded text.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (24)

Traitement du signal

Le traitement du signal est la discipline qui développe et étudie les techniques de traitement, d'analyse et d' des . Parmi les types d'opérations possibles sur ces signaux, on peut dénoter le contr

Étude de cas

L’étude de cas est une méthode utilisée dans les études qualitatives en sciences humaines et sociales, en psychologie ou en psychanalyse, mais elle peut être utilisée dans les études pour se pencher s

Transformation de Fourier

thumb|Portrait de Joseph Fourier.
En mathématiques, plus précisément en analyse, la transformation de Fourier est une extension, pour les fonctions non périodiques, du développement en série de Fou

Publications associées (42)

Chargement

Chargement

Chargement

Francisco Pereira Correia Pinto

Sound waves propagate through space and time by transference of energy between the particles in the medium, which vibrate according to the oscillation patterns of the waves. These vibrations can be captured by a microphone and translated into a digital signal, representing the amplitude of the sound pressure as a function of time. The signal obtained by the microphone characterizes the time-domain behavior of the acoustic wave field, but has no information related to the spatial domain. The spatial information can be obtained by measuring the vibrations with an array of microphones distributed at multiple locations in space. This allows the amplitude of the sound pressure to be represented not only as a function of time but also as a function of space. The use of microphone arrays creates a new class of signals that is somewhat unfamiliar to Fourier analysis. Current paradigms try to circumvent the problem by treating the microphone signals as multiple "cooperating" signals, and applying the Fourier analysis to each signal individually. Conceptually, however, this is not faithful to the mathematics of the wave equation, which expresses the acoustic wave field as a single function of space and time, and not as multiple functions of time. The goal of this thesis is to provide a formulation of Fourier theory that treats the wave field as a single function of space and time, and allows it to be processed as a multidimensional signal using the theory of digital signal processing (DSP). We base this on a physical principle known as the Huygens principle, which essentially says that the wave field can be sampled at the surface of a given region in space and subsequently reconstructed in the same region, using only the samples obtained at the surface. To translate this into DSP language, we show that the Huygens principle can be expressed as a linear system that is both space- and time-invariant, and can be formulated as a convolution operation. If the input signal is transformed into the spatio-temporal Fourier domain, the system can also be analyzed according to its frequency response. In the first half of the thesis, we derive theoretical results that express the 4-D Fourier transform of the wave field as a function of the parameters of the scene, such as the number of sources and their locations, the source signals, and the geometry of the microphone array. We also show that the wave field can be effectively analyzed on a small scale using what we call the space/time-frequency representation space, consisting of a Gabor representation across the spatio-temporal manifold defined by the microphone array. These results are obtained by treating the signals as continuous functions of space and time. The second half of the thesis is dedicated to processing the wave field in discrete space and time, using Nyquist sampling theory and multidimensional filter banks theory. In particular, we show examples of orthogonal filter banks that effectively represent the wave field in terms of its elementary components while satisfying the requirements of critical sampling and perfect reconstruction of the input. We discuss the architecture of such filter banks, and demonstrate their applicability in the context of real applications, such as spatial filtering and wave field coding.

The theme of this thesis revolves around three important manifestations of light, namely its corpuscular, wave and electromagnetic nature. Our goal is to exploit these principles to analyze, design and build imaging modalities by developing new signal processing and algorithmic tools, based in particular on sampling and sparsity concepts.
First, we introduce a new sampling scheme called variable pulse width, which is based on the finite rate of innovation (FRI) sampling paradigm. This new framework enables to sample and perfectly reconstruct weighted sums of Lorentzians; perfect reconstruction from sampled signals is guaranteed by a set of theorems.
Second, we turn to the context of light and study its reflection, which is based on the corpuscular model of light. More precisely, we propose to use our FRI-based model to represent bidirectional reflectance distribution functions. We develop dedicated light domes to acquire reflectance functions and use the measurements obtained to demonstrate the usefulness and versatility of our model. In particular, we concentrate on the representation of specularities, which are sharp and bright components generated by the direct reflection of light on surfaces.
Third, we explore the wave nature of light through Lippmann photography, a century-old photography technique that acquires the entire spectrum of visible light. This fascinating process captures interferences patterns created by the exposed scene inside the depth of a photosensitive plate. By illuminating the developed plate with a neutral light source, the reflected spectrum corresponds to that of the exposed scene. We propose a mathematical model which precisely explains the technique and demonstrate that the spectrum reproduction suffers from a number of distortions due to the finite depth of the plate and the choice of reflector. In addition to describing these artifacts, we describe an algorithm to invert them, essentially recovering the original spectrum of the exposed scene.
Next, the wave nature of light is further generalized to the electromagnetic theory, which we invoke to leverage the concept of polarization of light. We also return to the topic of the representation of reflectance functions and focus this time on the separation of the specular component from the other reflections. We exploit the fact that the polarization of light is preserved in specular reflections and investigate camera designs with polarizing micro-filters with different orientations placed just in front of the camera sensor; the different polarizations of the filters create a mosaic image, from which we propose to extract the specular component. We apply our demosaicing method to several scenes and additionally demonstrate that our approach improves photometric stereo.
Finally, we delve into the problem of retrieving the phase information of a sparse signal from the magnitude of its Fourier transform. We propose an algorithm that resolves the phase retrieval problem for sparse signals in three stages. Unlike traditional approaches that recover a discrete approximation of the underlying signal, our algorithm estimates the signal on a continuous domain, which makes it the first of its kind.
The concluding chapter outlines several avenues for future research, like new optical devices such as displays and digital cameras, inspired by the topic of Lippmann photography.

This thesis focuses on developing efficient algorithmic tools for processing large datasets. In many modern data analysis tasks, the sheer volume of available datasets far outstrips our abilities to process them. This scenario commonly arises in tasks including parameter tuning of machine learning models (e.g., Google Vizier) and training neural networks. These tasks often require solving numerical linear algebraic problems on large matrices, making the classical primitives prohibitively expensive. Hence, there is a crucial need to efficiently compress the available datasets. In other settings, even collecting the input dataset is extremely expensive, making it vital to design optimal data sampling strategies. This is common in applications such as MRI acquisition and spectrum sensing.
The fundamental questions above are often dual to each other, and hence can be addressed using the same set of core techniques. Indeed, exploiting structured Fourier sparsity is a recurring source of efficiency in this thesis, leading to both fast numerical linear algebra methods and sample efficient data acquisition schemes.
One of the main results that we present in this thesis is the first Sublinear-time Model-based Sparse FFT algorithm that achieves a nearly optimal sample complexity for recovery of every signal whose Fourier transform is well approximated by a small number of blocks (e.g., such structure is common in spectrum sensing). Our method matches in sublinear time the result of Baraniuk et. al. (2010), which started the field of model-based compressed sensing. Another highlight of this thesis includes the first Dimension-independent Sparse FFT algorithm that, computes the Fourier transform of a sparse signal in sublinear runtime in any dimension. This is the first algorithm that just like the FFT of Cooley and Tukey is dimension independent and avoids the curse of dimensionality inherent to all previously known techniques. Finally, we give a Universal Sampling Scheme for the reconstruction of structured Fourier signals from continuous measurements. Our approach matches the classical results of Slepian, Pollak, and Landau (1960s) on the reconstruction of bandlimited signals via Prolate Spheroidal Wave Functions and extends these results to a wide class of Fourier structure types.
Besides having classical applications in signal processing and data analysis, Fourier techniques have been at the core of many machine learning tasks such as Kernel Matrix Approximation. The second half of this thesis is dedicated to finding compressed and low-rank representations of kernel matrices, which are the primary means of computation with large kernel matrices in machine learning. We build on Fourier techniques and achieve spectral approximation guarantees to the Gaussian kernel using an optimal number of samples, significantly improving upon the classical Random Fourier Features of Rahimi and Recht (2008). Finally, we present a nearly-optimal Oblivious Subspace Embedding for high-degree Polynomial kernels which leads to nearly-optimal embeddings of the high-dimensional Gaussian kernel. This is the first result that does not suffer from an exponential loss in the degree of the polynomial kernel or the dimension of the input point set, providing exponential improvements over the prior works, including the TensorSketch of Pagh (2013) and application of the celebrated Fast Multipole Method of Greengard and Rokhlin (1986) to the kernel approximation problem.