**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Joint Fusion Learning of Multiple Time Series Prediction

Résumé

Accurate traffic density estimations is essential for numerous purposes like the developing successful transit policies or to forecast future traffic conditions for navigation. Current developments in the machine learning and computer systems bring the transportation industry numerous possibilities to improve their operations using data analyses on traffic flow sensor data . However, even state-of-art algorithms for time series forecasting perform well on some transportation problems, they still fail to solve some critical tasks. In particular, existing traffic flow forecasting methods that are not utilising causality relations between different data sources are still unsatisfying for many real-world applications . In this report, we have focused on a new method named joint fusion learning that uses underlying causality in time series. We test our method in a very detailed synthetic environment that we specially developed to imitate real-world traffic flow dataset. In the end, we use our joint-fusion learning on a historical traffic flow dataset for Thessaloniki, Greece which is published by Hellenic Institute of Transport (HIT) . We obtained better results on the short-term forecasts compared the widely-used benchmarks models that uses single time series to forecast the future.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (12)

Série temporelle

thumb|Exemple de visualisation de données montrant une tendances à moyen et long terme au réchauffement, à partir des séries temporelles de températures par pays (ici regroupés par continents, du nord

Fusion de données

vignette|La fusion de deux sources de données (dimension #1 et dimension #2) peut permettre une classification meilleure qu'une classification basée uniquement sur la dimension #1 ou la dimension #2.

Apprentissage automatique

L'apprentissage automatique (en anglais : machine learning, « apprentissage machine »), apprentissage artificiel ou apprentissage statistique est

Publications associées (3)

Chargement

Chargement

Chargement

Nowadays, physiological monitoring is imperative for the safety of medical operations. However, systems which monitor the depth of anaesthesia are still far from reliable, such that still some patients may experience the trauma of remaining conscious under general anaesthesia during surgery. The long term goal of our interdisciplinary project "BRACCIA" was to develop a device to measure the depth of anaesthesia. And, in view of this important goal, the main objective of research was to establish how the couplings between the cardiac, respiratory and cortical oscillations change in anaesthesia. Under the framework of this project, our objectives were: 1) The detection of the deep-light change of anaesthesia from experimental recordings on rats, and furthermore, the investigation of the interdependencies among three physiological activities, namely, the cardiac activity (H), respiration (R) and cortical activities (B) from experimental recordings of rats and humans, for each state of the depth of anaesthesia. 2) The modelling of the slow brain waves, and to consider the effect of anaesthesia on this realized model. The analysis of the recordings were carried out with five methods. First method is the "S-estimator", which indirectly quantifies the amount of synchronization within a data set measuring the contraction of the embedding dimension of the state space. Second method is the "new S-estimator". In this new one, a linear transformation of the reconstructed state space trajectory orthonormalizes the state variables within each model such that global state space volume reduction becomes a measure of synchronization exclusively between the different models. Third method is the "embedding dimension analysis", which examines the time evolution of the embedding dimension obtained with false nearest neighbors method on each windowed time series. The last two methods consist in a more detailed analysis of the dependencies among three systems. Fourth method is the "coupling matrix". This calculates the coupling matrix, CM, which infers linear interactions between multivariate time series after constructing separately the self model of each signal from reconstructed states. The last method is the "nonparametric Granger causality". This method calculates the Granger causality, GC, which measures bivariate causal influence in frequency domain. Here, a nonparametric estimation approach was used to remove the difficulties such as uncertainty in model parameters. For this method, permutation tests are added to figure out the real causality. From the obtained results, in most of the groups of experimental recordings, a change in the synchronization inside of the whole system between deep and light anaesthesia or between resting state and anaesthesia was found. For the results on Ketamine-Xylazine (KX) anaesthetized rats, a decrease of this synchronization was so clear that we succeeded to detect the deep-light transition of the anaesthesia, and automatically in one group of the recordings. The changes in the synchronization exclusively between three sub-systems as measured with the "new S-estimator" was only found on the results on Pentobarbital anaesthetized rats. Interestingly, a change in the time evolution of the embedding dimension was found on B and H in most of the results of the recordings. Furthermore, some changes in the couplings were found according to the depth of anaesthesia. For KX anaesthetized rats and human, coupling direction change was also observed. The nonparametric GC agreed for some causality change obtained with coupling matrix, but still some disagreements remained. From our literature study about the rhythmic activity of the brain, we deduce that it is generally accepted in the neuroscience community that the rhythms of the EEG are created by the interaction between the cortex and the thalamus. Consequently, we chose one of the Hodgkin-Huxley based model which considers the thalamocortical assemblies, the model of Bazhenov et al. This model includes four layers of neurons where two of them are for the thalamus (RE: thalamic reticular neuron, TC: thalamic relay neuron) and the other two are for the cortex (PY: pyramidal neuron, IN: interneuron). Not without having difficulties to set parameters and equations, we succeeded to reproduce this model and simulated it. Furthermore, we performed simulations on this model varying the parameter of maximal conductance of Ca2+ channels or the closing/opening rate of GABAA receptors from one simulation to another, and determined how the collective behavior of PY neurons, which should be interpreted as a field potential analogous to the EEG can be modified according to these parameter changes. Finally, to understand the model itself, some studies on the reproduced model were carried out. As a results, from observing the collective spiking activity of the PY cells on the reproduced Bazhenov model, we confirmed the presence of slow oscillations, namely the δ-waves. The reaction of the model to changes of the parameters related to anesthetics showed that reinforcement of the inhibition that suppresses the spiking activities was effectively induced by all changes of the parameters, but the degree of this reaction was different in each case. From the studies of the size of the model, we found that the local synchrony and the mean field frequency is not influenced by the size of network, but the global synchrony is not preserved at larger sizes. In conclusion, our data analysis studies showed a clear possibility for the detection of the depth of anaesthesia and a clear change in the interdependency changes depending on the anaesthesia. Moreover, our modelling study took an essential step for the investigation of the effect of anaesthesia on a brain model. As future work, all the unsolved problems which appear in the analysis chapter of this thesis should be treated, but especially the embedding dimension analysis should be followed up in detail. For modelling, as we just made the first step, we should continue with the understanding of the model and the closing of the gap between the mathematical model and the reality.

Karl Aberer, Samuel François Roger Joseph Humeau, Matteo Vasirani, Tri Kurniawan Wijaya

The recent development of smart meters has allowed the analysis of household electricity consumption in real time. Predicting electricity consumption at such very low scales should help to increase the efficiency of distribution networks and energy pricing. However, this is by no means a trivial task since household-level consumption is much more irregular than at the transmission or distribution levels. In this work, we address the problem of improving consumption forecasting by using the statistical relations between consumption series. This is done both at the household and district scales (hundreds of houses), using various machine learning techniques, such as support vector machine for regression (SVR) and multilayer perceptron (MLP). First, we determine which algorithm is best adapted to each scale, then, we try to find leaders among the time series, to help short-term forecasting. We also improve the forecasting for district consumption by clustering houses according to their consumption profiles.

In recent years time series data has become ubiquitous thanks to affordable sensors and advances in embedded technology. Large amount of time-series data are continuously produced in a wide spectrum of applications, such as sensor networks, medical monitoring and so on. Availability of such large scale time series data highlights the importance of of scalable data management, efï¬cient querying and analysis. Meanwhile, in the online setting time series carries invaluable information and knowledge about the real-time status of involved entities or monitored phenomena, which calls for online time series data mining for serving timely decision making or event detection. In this thesis we aim to address these important issues pertaining to scalable and distributed analytics techniques for massive time series data. Concretely, this thesis is centered around the following three topics: As the number of sensors that pervade our lives signiï¬cantly increases (e.g., environmental sensors, mobile phone sensors, IoT applications, etc.), the efï¬cient management of massive amount of time series from such sensors is becoming increasingly important. The inï¬nite nature of sensor data poses a serious challenge for query processing even in a cloud infrastructure. Traditional raw sensor data management systems based on relational databases lack scalability to accommodate large scale sensor data efï¬ciently. Thus, distributed key-value stores in the cloud are becoming a prime tool to manage sensor data. However, currently there are no techniques for indexing and/or query optimization of the model-view sensor time series data in the cloud. In Chapter 2, we propose an innovative index for modeled segments in key-value stores, namely KVI-index. KVI-index consists of two interval indices on the time and sensor value dimensions respectively, each of which has an in-memory search tree and a secondary list materialized in the key-value store. The dramatic increase in the availability of data streams fuels the development of many distributed real-time computation engines (e.g., Storm, Samza, Spark Streaming, S4 etc.). In Chapter 3, we focus on a fundamental time series mining task in such a new computation paradigm, namely continuously mining dynamic (lagged) correlations in time series via a distributed real-time computation engine. Correlations reveal the hidden and temporal interactions across time series and are widely used in scientiï¬c data analysis, data-driven event detection, ï¬nance markets and so on. We propose the P2H framework consisting of a parallelism-partitioning based data shufï¬ing and a hypercube structure based computation pruning method, so as to enhance both the communication and computation efï¬ciency for mining correlations in the distributed context. In numerous real-world applications large datasets collected from observations and measurements of physical entities are inevitably noisy and contain outliers. The outliers in such large and noisy datasets can dramatically degrade the performance of standard distributed machine learning approaches such as s regression trees. In Chapter 4 we present a novel distributed regression tree approach that utilizes robust regression statistics, statistics that are more robust to outliers, for handling large and noisy datasets. Then we present an adaptive gradient learning method for recurrent neural networks (RNN) to forecast streaming time series in the presence of both outliers and change points.