Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.
Intelligent information processing seems to be one of the most challenging task among those involved in human-computer interaction. A central issue is how to model the various types of interaction among artificial and natural entities at different levels of abstraction. On the one hand, models of interaction are required to better understand the communication phenomena. On the other, suitable languages and paradigms should provide powerful frameworks for developing computer-based applications. In this dissertation I focus on different aspects of the second problem, trying to develop a methodology for the design of interactive natural language applications (e.g. from question-answering to mixed-initiative dialogue). One of the main aspects I am concerned with in this work is the problem of their robustness. Several methods have been proposed for achieving robustness in natural language understanding, but these methods are sometimes hard to scale up or re-use in different applications. Moreover, they often concentrate on a single linguistic level of the processing rather than offering a global solution. I will set up a Language Engineering environment whose goal is to combine software engineering and cognitive aspects (e.g. aspects related to representation of a mental model of the speaker). Given its complexity, it is apparent that the problem can be solved only partially. I want to stress here that the main contribution of my work is a holistic perspective on the problem of natural language understanding. Rather than focusing on a particular aspect of natural language processing, I tried to benefit from the big amount of work that has been already done in Computational Linguistics and Computer Science merging different ideas and techniques. In the first part of the dissertation I explore the universe of Language Engineering in order to clarify how my contribution can be situated. After a survey on the state of the art on robust methods in analysis of natural language data, I focus on the role that Computational Logic plays in relating the syntactic and semantic analysis of natural language to its practical understanding within specific applications. Robustness is considered from two complementary perspectives, borrowing the terminology from modern software engineering: robustness "in the small" and robustness "in the large". The first perspective is discussed while presenting an application for the Interaction through Speech with Information Systems, where robust semantic parsing is used to extract queries from spoken natural language utterances.The second perspective is examplified by the re-engineering of an existing text analysis system using a new Language Engineering methodology: Agent-Oriented Language Engineering. In the second part of the thesis I discuss how cognitive aspects can be integrated into a Language Engineering environment leading to the notion of Cognitive Language Enginering. I tackle the difficult problem of robust dialogue management from both a cognitive and computational perspective. I propose two frameworks for the semantic representation and assimilation of information into the dialogue information state. The first framework allows us to represent and reason about the dynamic aspects of objects and events. The second framework is centered on the notion of mental space and it is used to build representations of the cognitive processing of information during communication.
Loading
Loading
Loading
Loading
Loading
black-boxes''. The Law of Parsimony states that
simpler solutions are more likely to be correct than complex ones''. Since they perform quite well in practice, a natural question to ask, then, is in what way are neural networks simple?
We propose that compression is the answer. Since good generalization requires invariance to irrelevant variations in the input, it is necessary for a network to discard this irrelevant information. As a result, semantically similar samples are mapped to similar representations in neural network deep feature space, where they form simple, low-dimensional structures.
Conversely, a network that overfits relies on memorizing individual samples. Such a network cannot discard information as easily.
In this thesis we characterize the difference between such networks using the non-negative rank of activation matrices. Relying on the non-negativity of rectified-linear units, the non-negative rank is the smallest number that admits an exact non-negative matrix factorization.
We derive an upper bound on the amount of memorization in terms of the non-negative rank, and show it is a natural complexity measure for rectified-linear units.
With a focus on deep convolutional neural networks trained to perform object recognition, we show that the two non-negative factors derived from deep network layers decompose the information held therein in an interpretable way. The first of these factors provides heatmaps which highlight similarly encoded regions within an input image or image set. We find that these networks learn to detect semantic parts and form a hierarchy, such that parts are further broken down into sub-parts.
We quantitatively evaluate the semantic quality of these heatmaps by using them to perform semantic co-segmentation and co-localization. In spite of the convolutional network we use being trained solely with image-level labels, we achieve results comparable or better than domain-specific state-of-the-art methods for these tasks.
The second non-negative factor provides a bag-of-concepts representation for an image or image set. We use this representation to derive global image descriptors for images in a large collection. With these descriptors in hand, we perform two variations content-based image retrieval, i.e. reverse image search. Using information from one of the non-negative matrix factors we obtain descriptors which are suitable for finding semantically related images, i.e., belonging to the same semantic category as the query image. Combining information from both non-negative factors, however, yields descriptors that are suitable for finding other images of the specific instance depicted in the query image, where we again achieve state-of-the-art performance.