**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Prior knowledge in Kernel methods

Abstract

Machine Learning is a modern and actively developing field of computer science, devoted to extracting and estimating dependencies from empirical data. It combines such fields as statistics, optimization theory and artificial intelligence. In practical tasks, the general aim of Machine Learning is to construct algorithms able to generalize and predict in previously unseen situations based on some set of examples. Given some finite information, Machine Learning provides ways to exract knowledge, describe, explain and predict from data. Kernel Methods are one of the most successful branches of Machine Learning. They allow applying linear algorithms with well-founded properties such as generalization ability, to non-linear real-life problems. Support Vector Machine is a well-known example of a kernel method, which has found a wide range of applications in data analysis nowadays. In many practical applications, some additional prior knowledge is often available. This can be the knowledge about the data domain, invariant transformations, inner geometrical structures in data, some properties of the underlying process, etc. If used smartly, this information can provide significant improvement to any data processing algorithm. Thus, it is important to develop methods for incorporating prior knowledge into data-dependent models. The main objective of this thesis is to investigate approaches towards learning with kernel methods using prior knowledge. Invariant learning with kernel methods is considered in more details. In the first part of the thesis, kernels are developed which incorporate prior knowledge on invariant transformations. They apply when the desired transformation produce an object around every example, assuming that all points in the given object share the same class. Different types of objects, including hard geometrical objects and distributions are considered. These kernels were then applied for images classification with Support Vector Machines. Next, algorithms which specifically include prior knowledge are considered. An algorithm which linearly classifies distributions by their domain was developed. It is constructed such that it allows to apply kernels to solve non-linear tasks. Thus, it combines the discriminative power of support vector machines and the well-developed framework of generative models. It can be applied to a number of real-life tasks which include data represented as distributions. In the last part of the thesis, the use of unlabelled data as a source of prior knowledge is considered. The technique of modelling the unlabelled data with a graph is taken as a baseline from semi-supervised manifold learning. For classification problems, we use this apporach for building graph models of invariant manifolds. For regression problems, we use unlabelled data to take into account the inner geometry of the input space. To conclude, in this thesis we developed a number of approaches for incorporating some prior knowledge into kernel methods. We proposed invariant kernels for existing algorithms, developed new algorithms and adapted a technique taken from semi-supervised learning for invariant learning. In all these cases, links with related state-of-the-art approaches were investigated. Several illustrative experiments were carried out on real data on optical character recognition, face image classification, brain-computer interfaces, and a number of benchmark and synthetic datasets.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related publications (110)

Loading

Loading

Loading

Artificial intelligence (AI) and machine learning (ML) have become de facto tools in many real-life applications to offer a wide range of benefits for individuals and our society. A classic ML model is typically trained with a large-scale static dataset in an offline manner. Therefore, it can not quickly capture new knowledge in non-stationary environments, and it is difficult to maintain long-term memory for knowledge learned earlier. In practice, many ML systems often need to learn new knowledge (e.g., domains, tasks, distributions, etc.) as more data and experiences are collected, which is referred to as a lifelong ML paradigm in this thesis. We focus on two fundamental challenges to achieve lifelong learning. The first challenge is to quickly learn new knowledge with a small number of observations, and we refer to it as data efficiency. The second challenge is to prevent an ML system from forgetting the old knowledge it has previously learned, and we refer to this challenge as knowledge retention. These two capabilities are crucial for applying ML to most practical applications. In this thesis, we study three important applications with these two challenges, including recommendation systems, task-oriented dialog systems, and the image classification task.
First, we propose two approaches to improve data efficiency for task-oriented dialog systems. The first proposed approach is based on Meta-learning, aiming to learn a better model parameter initialization from training data. It can quickly reach a good parameter region of new domains or tasks with a small number of labeled data. The second proposal takes a semi-supervised self-training approach to iteratively train a better model using sufficient unlabeled data when only a limited number of labeled data are available. We empirically demonstrate that both approaches effectively improve data efficiency to learn new knowledge. The second self-training method even consistently improves state-of-the-art large-scale pre-trained models.
Second, we tackle the knowledge retention challenge to mitigate the detrimental catastrophic forgetting issue when neural networks learn new knowledge sequentially. We formulate and investigate the ``continual learning'' setting for task-oriented dialog systems and recommendation systems. Through extensive empirical evaluation and analysis, we demonstrate the importance of (1) exemplar replay: storing representative historical data and replaying them to the model while learning new knowledge; (2) dynamic regularization: applying a dynamic regularization term to put flexible constraints on not forgetting previously learned knowledge in each model update cycle.
Lastly, we conduct several initial attempts to achieve both data efficiency and knowledge retention in a unified framework. In the recommendation scenario, we propose two approaches using different non-parametric memory modules to retain long-term knowledge. More importantly, the two proposed non-parametric predictions computed on top of them help learn and memorize new knowledge in a data-efficient manner. Apart from the recommendation scenario, we propose a probabilistic evaluation protocol in the widely studied image classification domain. It is general and versatile to simulate a wide range of realistic lifelong learning scenarios that require both knowledge retention and data efficiency for studying different techniques. Through experiments, we also demonstrate the benefit

Related concepts (53)

Knowledge is a form of awareness or familiarity. It is often understood as awareness of facts or as practical skills, and may also mean familiarity with objects or situations. Knowledge of facts, al

Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of human beings or animals. AI applications include advanced web search engines (e.g., Google

Learning is the process of acquiring new understanding, knowledge, behaviors, skills, values, attitudes, and preferences. The ability to learn is possessed by humans, animals, and some machines; th

Machine Learning is a modern and actively developing field of computer science, devoted to extracting and estimating dependencies from empirical data. It combines such fields as statistics, optimization theory and artificial intelligence. In practical tasks, the general aim of Machine Learning is to construct algorithms able to generalize and predict in previously unseen situations based on some set of examples. Given some finite information, Machine Learning provides ways to exract knowledge, describe, explain and predict from data. Kernel Methods are one of the most successful branches of Machine Learning. They allow applying linear algorithms with well-founded properties such as generalization ability, to non-linear real-life problems. Support Vector Machine is a well-known example of a kernel method, which has found a wide range of applications in data analysis nowadays. In many practical applications, some additional prior knowledge is often available. This can be the knowledge about the data domain, invariant transformations, inner geometrical structures in data, some properties of the underlying process, etc. If used smartly, this information can provide significant improvement to any data processing algorithm. Thus, it is important to develop methods for incorporating prior knowledge into data-dependent models. The main objective of this thesis is to investigate approaches towards learning with kernel methods using prior knowledge. Invariant learning with kernel methods is considered in more details. In the first part of the thesis, kernels are developed which incorporate prior knowledge on invariant transformations. They apply when the desired transformation produce an object around every example, assuming that all points in the given object share the same class. Different types of objects, including hard geometrical objects and distributions are considered. These kernels were then applied for images classification with Support Vector Machines. Next, algorithms which specifically include prior knowledge are considered. An algorithm which linearly classifies distributions by their domain was developed. It is constructed such that it allows to apply kernels to solve non-linear tasks. Thus, it combines the discriminative power of support vector machines and the well-developed framework of generative models. It can be applied to a number of real-life tasks which include data represented as distributions. In the last part of the thesis, the use of unlabelled data as a source of prior knowledge is considered. The technique of modelling the unlabelled data with a graph is taken as a baseline from semi-supervised manifold learning. For classification problems, we use this apporach for building graph models of invariant manifolds. For regression problems, we use unlabelled data to take into account the inner geometry of the input space. To conclude, in this thesis we developed a number of approaches for incorporating some prior knowledge into kernel methods. We proposed invariant kernels for existing algorithms, developed new algorithms and adapted a technique taken from semi-supervised learning for invariant learning. In all these cases, links with related state-of-the-art approaches were investigated. Several illustrative experiments were carried out on real data on optical character recognition, face image classification, brain-computer interfaces, and a number of benchmark and synthetic datasets.

Samy Bengio, Alexei Pozdnoukhov

This paper presents a new algorithm for classifying distributions. The algorithm combines the principle of margin maximization and a kernel trick, applied to distributions. Thus, it combines the discriminative power of support vector machines and the well-developed framework of generative models. It can be applied to a number of real-life tasks which include data represented as distributions. The algorithm can also be applied for introducing some prior knowledge on invariances into a discriminative model. We illustrate this approach in details for the case of Gaussian distributions, using a toy problem. We also present experiments devoted to the real-life problem of invariant image classification.