**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Portfolio construction under information asymmetry

Résumé

We introduce in this thesis the idea of a variable lookback model, i.e., a model whose predictions are based on a variable portion of the information set. We verify the intuition of this model in the context of experimental finance. We also propose a novel algorithm to estimate it, the variable lookback algorithm, and apply the latter to build investment strategies. Financial markets under information asymmetry are characterized by the presence of better-informed investors, also called insiders. The literature in finance has so far concentrated on theoretical models describing such markets, in particular on the role played by the price in conveying information from informed to uninformed investors. However, the implications of these theories have not yet been incorporated into processing methods to extract information from past prices and this is the aim of this thesis. More specifically, the presence of a time-varying number of insiders induces a time-varying predictability in the price process, which calls for models that use a variable lookback window. Moreover, although our initial motivation comes from the study of markets under information asymmetry, the problem is more general, as it touches several issues in statistical modeling. The first one concerns the structure of the model. Existing methods use a fixed model structure despite evidences from data, which support an adaptive one. The second one concerns the improper handling of the nonstationarity in data. The stationarity assumption facilitates the mathematical treatment. Hence, existing methods relies on some form of stationarity, for example, by assuming local stationary, as in the windowing approach, or by modeling the underlying switching process, for example, with a Markov chain of order 1. However, these suffer from certain limitations and more advanced methods that take explicitly into account the nonstationariry of the signal are desirable. In summary, there is a need to develop a method that constantly monitors what is the appropriate structure, when a certain model works and when not or when are the underlying assumptions of the model violated. We verify our initial intuition in the context of experimental finance. In particular, we highlight the diffusion of information in the market. We give a precise definition to the notion of the time of maximally informative price and verify, in line with existing theories, that the time of maximally informative price is inversely proportional to the number of insiders in the market. This supports the idea of a variable lookback model. Then, we develop an estimation algorithm that selects simultaneously the order of the process and the lookback window based on the minimum description length principle. The algorithm maintains a series of estimators, each based on a different order and/or information set. The selection is based on an information theoretic criterion, that accounts for the ability of the model to fit the data, penalized by the model complexity and the amount of switching between models. Finally, we put the algorithm at work and build investment strategies. We devise a method to draw dynamically the trend line for the time-series of log-prices and propose an adaptive version of the well-known momentum strategy. The latter outperforms standard benchmarks, in particular during the 2009 momentum crash.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Publications associées (30)

Chargement

Chargement

Chargement

Concepts associés (23)

Théorie

Une théorie (du grec theoria, « contempler, observer, examiner ») est un ensemble cohérent, si elle prétend à la scientificité, d'explications, de notions ou d'idées sur un sujet précis, pouvant inclu

Temps

thumb|Chronos, dieu du temps de la mythologie grecque, par Ignaz Günther, Bayerisches Nationalmuseum à Munich.
vignette|Montre à gousset ancienne
Le temps est une notion qui rend compte du changement

Série temporelle

thumb|Exemple de visualisation de données montrant une tendances à moyen et long terme au réchauffement, à partir des séries temporelles de températures par pays (ici regroupés par continents, du nord

The objective of this thesis is to develop probabilistic graphical models for analyzing human interaction in meetings based on multimodel cues. We use meeting as a study case of human interactions since research shows that high complexity information is mostly exchanged through face-to-face interactions. Modeling human interaction provides several challenging research issues for the machine learning community. In meetings, each participant is a multimodal data stream. Modeling human interaction involves simultaneous recording and analysis of multiple multimodal streams. These streams may be asynchronous, have different frame rates, exhibit different stationarity properties, and carry complementary (or correlated) information. In this thesis, we developed three probabilistic graphical models for human interaction analysis. The proposed models use the ``probabilistic graphical model'' formalism, a formalism that exploits the conjoined capabilities of graph theory and probability theory to build complex models out of simpler pieces. We first introduce the multi-layer framework, in which the first layer models typical individual activity from low-level audio-visual features, and the second layer models the interactions. The two layers are linked by a set of posterior probability-based features. Next, we describe the team-player influence model, which learns the influence of interacting Markov chains within a team. The team-player influence model has a two-level structure: individual-level and group-level. Individual level models actions of each player, and the group-level models actions of the team as a whole. The influence of each player on the team is jointly learned with the rest of the model parameters in a principled manner using the Expectation-Maximization (EM) algorithm. Finally, we describe the semi-supervised adapted HMMs for unusual event detection. Unusual events are characterized by a number of features (rarity, unexpectedness, and relevance) that limit the application of traditional supervised model-based approaches. We propose a semi-supervised adapted Hidden Markov Model (HMM) framework, in which usual event models are first learned from a large amount of (commonly available) training data, while unusual event models are learned by Bayesian adaptation in an unsupervised manner.

The objective of this thesis is to develop probabilistic graphical models for analyzing human interaction in meetings based on multimodel cues. We use meeting as a study case of human interactions since research shows that high complexity information is mostly exchanged through face-to-face interactions. Modeling human interaction provides several challenging research issues for the machine learning community. In meetings, each participant is a multimodal data stream. Modeling human interaction involves simultaneous recording and analysis of multiple multimodal streams. These streams may be asynchronous, have different frame rates, exhibit different stationarity properties, and carry complementary (or correlated) information. In this thesis, we developed three probabilistic graphical models for human interaction analysis. The proposed models use the ``probabilistic graphical model'' formalism, a formalism that exploits the conjoined capabilities of graph theory and probability theory to build complex models out of simpler pieces. We first introduce the multi-layer framework, in which the first layer models typical individual activity from low-level audio-visual features, and the second layer models the interactions. The two layers are linked by a set of posterior probability-based features. Next, we describe the team-player influence model, which learns the influence of interacting Markov chains within a team. The team-player influence model has a two-level structure: individual-level and group-level. Individual level models actions of each player, and the group-level models actions of the team as a whole. The influence of each player on the team is jointly learned with the rest of the model parameters in a principled manner using the Expectation-Maximization (EM) algorithm. Finally, we describe the semi-supervised adapted HMMs for unusual event detection. Unusual events are characterized by a number of features (rarity, unexpectedness, and relevance) that limit the application of traditional supervised model-based approaches. We propose a semi-supervised adapted Hidden Markov Model (HMM) framework, in which usual event models are first learned from a large amount of (commonly available) training data, while unusual event models are learned by Bayesian adaptation in an unsupervised manner.

Localizing the source of an epidemic is a crucial task in many contexts, including the detection of malicious users in social networks and the identification of patient zeros of disease outbreaks. The difficulty of this task lies in the strict limitations on the data available: In most cases, when an epidemic spreads, only few individuals, who we will call sensors, provide information about their state. Furthermore, as the spread of an epidemic usually depends on a large number of variables, accounting for all the possible spreading patterns that could explain the available data can easily result in prohibitive computational costs. Therefore, in the field of source localization, there are two central research directions: The design of practical and reliable algorithms for localizing the source despite the limited data, and the optimization of data collection, i.e., the identification of the most informative sensors. In this dissertation we contribute to both these directions. We consider network epidemics starting from an unknown source. The only information available is provided by a set of sensor nodes that reveal if and when they become infected. We study how many sensors are needed to guarantee the identification of the source. A set of sensors that guarantees the identification of the source is called a double resolving set (DRS); the minimum size of a DRS is called the double metric dimension (DMD). Computing the DMD is, in general, hard, hence estimating it with bounds is desirable. We focus on G(N,p) random networks for which we derive tight bounds for the DMD. We show that the DMD is a non-monotonic function of the parameter p, hence there are critical parameter ranges in which source localization is particularly difficult.
Again building on the relationship between source localization and DRSs, we move to optimizing the choice of a fixed number K of sensors. First, we look at the case of trees where the uniqueness of paths makes the problem simpler. For this case, we design polynomial time algorithms for selecting K sensors that optimize certain metrics of interest. Next, turning to general networks, we show that the optimal sensor set depends on the distribution of the time it takes for an infected node u to infect a non-infected neighbor v, which we call the transmission delay from u to v. We consider both a low- and a high-variance regime for the transmission delays. We design algorithms for sensor placement in both cases, and we show that they yield an improvement of up to 50% over state-of-the-art methods.
Finally, we propose a framework for source localization where some sensors (called dynamic sensors) can be added while the epidemic spreads and the localization progresses. We design an algorithm for joint source localization and dynamic sensor placement; This algorithm can handle two regimes: offline localization, where we localize the source after the epidemic spread, and online localization, where we localize the source while the epidemic is ongoing. We conduct an empirical study of offline and online localization and show that, by using dynamic sensors, the number of sensors we need to localize the source is up to 10 times less with respect to a strategy where all sensors are deployed a priori. We also study the resistance of our methods to high-variance transmission delays and show that, even in this setting, using dynamic sensors, the source can be localized with less than 5% of the nodes being sensors.