**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Vision-Based Sense and Avoid Algorithms for Unmanned Aerial Vehicles

Résumé

The field of Unmanned Aerial Vehicles (UAVs), also known as drones, is rapidly growing, both in terms of size and of number of applications. Civil applications range from mapping, inspection, search and rescue, taking aerial footage, to art show, entertainment and more. Currently, most applications have a human pilot supervising or controlling the vehicles, but UAVs are expected to gain more autonomy with time. To fly in general airspace, used by both general and commercial aviation, a high level of autonomy is required from UAVs. A core functionality required to fly in general airspace is the UAVs' ability to detect and avoid collisions with other aircraft or objects. This functionality is handled by a so called Sense And Avoid (SAA) system. From among several sensors investigated to be used for a SAA system, a vision-based sensor is seen as a good candidate for a SAA system due to its ability to detect and identify a large variety of objects, as well as being close to the human's main mean to detect aircraft and other objects. To be as general as possible, this work focuses on non-cooperative algorithms that do not take assumptions on the motion of other aircraft. This thesis presents algorithms for a vision-based SAA system. It focuses on the relationship between sensing and avoidance, and how the limitations of one constrain the second. In particular, this thesis studies the consequences of the limited Field Of View (FOV) of a camera sensor on the collision avoidance algorithms. Given the assumptions above, the sensing and tracking of other UAVs is performed using cameras with fish-eye lenses that have a large enough FOV for the collision avoidance algorithms to guarantee to be collision-free. The detection of other UAVs is performed using two methods: a marker-based or a marker-less computer vision algorithms. Using the measurements from the computer vision algorithm, the positions and velocities of neighboring UAVs are tracked using a Gaussian mixture probability hypothesis density filter. This tracking algorithm is able to track multiple UAVs while requiring little computational resources, therefore representing a suitable candidate for on-board deployment. In this work, it is mathematically proven that the motion of an UAV has to be constrained according to the FOV of its sensor. Following that result, several collision avoidance algorithms are adapted to ensure collision-free navigation when used with a sensor with a limited FOV. Sensory limitations such as noise, lag, limited range and FOV, and their effects on the performance of collision avoidance algorithms are studied. Experimental work using high-fidelity simulation and real robots shows that algorithms that only use position information from the sensors are overall more reliable, although less efficient (in terms of distance traveled or trajectory smoothness) than algorithms that also use velocity estimates from the sensing system.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Publications associées (13)

Chargement

Chargement

Chargement

Concepts associés (18)

Drone

vignette|Un Parrot AR.Drone devant un Dassault Rafale.
vignette|Un drone de reconnaissance EADS Harfang lors du Salon du Bourget de 2007.
vignette|Drone civil OnyxStar Fox-C8 XT en vol.
vignette|Dron

Algorithme

thumb|Algorithme de découpe d'un polygone quelconque en triangles (triangulation).
Un algorithme est une suite finie et non ambiguë d'instructions et d’opérations permettant de résoudre une classe de

Système

Un système est un ensemble d' interagissant entre eux selon certains principes ou règles. Par exemple une molécule, le système solaire, une ruche, une société humaine, un parti, une armée etc.
Un s

Vision-based inertial-aided navigation is gaining ground due to its many potential applications. In previous decades, the integration of vision and inertial sensors was monopolised by the defence industry due to its complexity and unrealistic economic burden. After the technology advancement, high-quality hardware and computing power became reachable for the investigation and realisation of various applications. In this thesis, a mapping system by vision-aided inertial navigation was developed for areas where GNSS signals are unreachable, for example, indoors, tunnels, city canyons, forests, etc. In this framework, a methodology on the integration of vision and inertial sensors was presented, analysed and tested when the only available information at the beginning is a number of features with known location/coordinates (with no GNSS signals accessibility), thus employing the method of "SLAM: Simultaneous Localisation And Mapping". SLAM is a term used in the robotics community to describe the problem of mapping the environment and at the same time using this map to determine (or to help in determining) the location of the mapping device. In addition to this, a link between the robotics and geomatics community was established where briefly the similarities and differences were outlined in terms of handling the navigation and mapping problem. Albeit many differences, the goal is common: developing a "navigation and mapping system" that is not bounded to the limits imposed by the used sensors. Classically, terrestrial robotics SLAM is approached using LASER scanners to locate the robot relative to a structured environment and to map this environment at the same time. However, outdoors robotics SLAM is not feasible with LASER scanners alone due to the environment's roughness and absence of simple geometric features. Recently in the robotics community, the use of visual methods, integrated with inertial sensors, has gained an interest. These visual methods rely on one or more cameras (or video) and make use of a single Kalman Filter with a state vector containing the map and the robot coordinates. This concept introduces high non-linearity and complications to the filter, which then needs to run at high rates (more than 20 Hz) with simplified navigation and mapping models. In this study, SLAM is developed using the Geomatics Engineering approach. Two filters are used in parallel: the Least-Squares Adjustment (LSA) for feature coordinates determination and the Kalman Filter (KF) for navigation correction. For this, a mobile mapping system (independent of GPS) is introduced by employing two CCD cameras (one metre apart) and one IMU. Conceptually, the outputs of the LSA photogrammetric resection (position and orientation) are used as the external measurements for the inertial KF. The filtered position and orientation are subsequently employed in the Photogrammetric intersection to map the surrounding features that are used as control points for the resection in the next epoch. In this manner, the KF takes the form of a navigation only filter, with a state vector containing the corrections to the navigation parameters. This way, the mapping and localisation can be updated at low rates (1 to 2 Hz) and use more complete modelling. Results show that this method is feasible with limitation induced from the quality of the images and the number of used features. Although simulation showed that (depending on the image geometry) determining the features' coordinates with an accuracy of 5-10 cm for objects at distances of up to 10 metres is possible, in practice this is not achieved with the employed hardware and pixel measurement techniques. Navigational accuracies depend as well on the quality of the images and the number and accuracy of the points used in the resection. While more than 25 points are needed to achieve centimetre accuracy from resection, they have to be within a distance of 10 metres from the cameras; otherwise, the resulting resection output will be of insufficient accuracy and further integration quality deteriorates. The initial conditions highly affect SLAM performance; these are the method of IMU initialisation and the a-priori assumptions on error distribution. The geometry of the system will furthermore have a consequence on possible applications. To conclude, the development consisted in establishing a mathematical framework, as well as implementing methods and algorithms for a novel integration methodology between vision and inertial sensors. The implementation and validation of the software have presented the main challenges, and it can be considered the first of a kind where all components were developed from scratch, with no pre-existing modules. Finally, simulations and practical tests were carried out, from which initial conclusions and recommendations were drawn to build upon. It is the author's hope that this work will stimulate others to investigate further this interesting problem taking into account the conclusions and recommendations sketched herein.

Over the past few decades we have been experiencing a data explosion; massive amounts of data are increasingly collected and multimedia databases, such as YouTube and Flickr, are rapidly expanding. At the same time rapid technological advancements in mobile devices and vision sensors have led to the emergence of novel multimedia mining architectures. These produce even more multimedia data, which are possibly captured under geometric transformations and need to be efficiently stored and analyzed. It is also common in such systems that data are collected distributively. This very fact poses great challenges in the design of effective methods for analysis and knowledge discovery from multimedia data. In this thesis, we study various instances of the problem of classification of visual data under the viewpoint of modern challenges. Roughly speaking, classification corresponds to the problem of categorizing an observed object to a particular class (or category), based on previously seen examples. We address important issues related to classification, namely flexible data representation for joint coding and classification, robust classification in the case of large geometric transformations and classification with multiple object observations in both centralized and distributed settings. We start by identifying the need for flexible data representation methods that are efficient in both storage and classification of multimedia data. Such flexible schemes offer the potential to significantly impact the efficiency of current multimedia mining systems, as they permit the classification of multimedia patterns directly in their compressed form, without the need for decompression. We propose a framework, called semantic approximation, which is based on sparse data representations. It formulates dimensionality reduction as a matrix factorization problem, under hybrid criteria that are posed as a trade-off between approximation for efficient compression and discrimination for effective classification. We demonstrate that the proposed methodology competes with state-of-the-art solutions in image classification and face recognition, implying that compression and classification can be performed jointly without performance penalties with respect to expensive disjoint solutions. Next, we allow the multimedia patterns to be geometrically transformed and we focus on transformation invariance issues in pattern classification. When a pattern is transformed, it spans a manifold in a high dimensional space. We focus on the problem of computing the distance of a certain test pattern from the manifold, which is also closely related to the image alignment problem. This is a hard non-convex problem that has only been sub-optimally addressed before. We represent transformation manifolds based on sparse geometric expansions, which results in a closed-form representation of the manifold equation with respect to the transformation parameters. When the transformation consists of a synthesis of translations, rotations and scalings, we prove that the objective function of this problem can be decomposed as a difference of convex functions (DC). This very property allows us to solve optimally our optimization problem with a cutting plane algorithm, which is well known to successfully find the global minimizer in practice. We showcase applications in robust face recognition and image alignment. The classification problem with multiple observations is addressed next. Multiple observations are typically produced in practice when an object is observed over successive time instants or under different viewpoints. In particular, we focus on the problem of classifying an object when multiple geometrically transformed observations of it are available. These multiple observations typically belong to a manifold and the classification problem resides in determining which manifold the observations belong to. We show that this problem can be viewed as a special case of semi-supervised learning, where all unlabelled examples belong to the same class. We design a graph-based algorithm, which exploits the structure of the manifold. Estimating the unknown object class then results into a discrete optimization problem that can be solved efficiently. We show the performance of our algorithm in classification of multiple handwritten digit images and in video-based face recognition. Next, we study the problem of classification of multiple observations in distributed scenarios, such as camera networks. In this case the classification is performed iteratively and distributively, without the presence of a central coordinator node. The main goal is to reach a global classification decision using only local computation and communication, white ensuring robustness to changes in network topology. We propose to use consensus algorithms in order to design a distributed version of the aforementioned graph-based algorithm. We show that the distributed classification algorithm has similar performance as its centralized counterpart, provided that the training set is sufficiently large. Finally, we delve further into the convergence properties of consensus-based distributed algorithms and we propose an acceleration methodology for fast convergence that uses the memory of the sensors. Our simulations show that the convergence is indeed accelerated in both static and dynamic networks, and that distributed classification in sensor networks can significantly benefit from them. Overall, the present thesis addresses a few important issues related to pattern analysis and classification in modern multimedia systems. Our solutions for semantic approximation and transformation invariance can impact the efficiency and robustness of classification in multimedia systems. Furthermore, our graph-based framework for multiple observations is able to perform effective classification in both centralized and distributed environments. Finally, our fast consensus algorithms can significantly contribute to the accelerated convergence of distributed classification algorithms in sensor networks.

Noise radiated by different industrial structures that surround us in daily life are more and more considered as environmental pollution. Standards defining a tolerable sound level for each of these noise sources are regularly called into question and respecting them becomes an additional constraint for manufacturers. The last decades have seen the increasing development of means used to fight these noise disturbances. The wide diversity of noises perceived as harmful has contributed to the progressive increase in specificity and efficiency of solutions proposed to reduce the disturbances. For some applications, passive solutions, based on the use of materials capable of absorbing or deviating acoustic or vibratory waves, have progressively been replaced by "active" solutions, based on the generation of an acoustic wave of opposite phase to the disturbing one radiated by the noise source. Power transformers are sources for which passive solutions (anti-noise walls) are usually expensive and not very efficient, particularly at low frequency. Furthermore, characteristics of noise radiated by transformers (low frequency tone noise) are such that they are particularly well suited for the implementation of an active solution. Typically, an active control system dedicated to transformers (feedforward) is composed of actuators, used to generate the anti-noise and usually located near the tank, sensors, used to measure the attenuation obtained and to provide a reference signal, and a controller used to drive the actuators as a function of the information collected by the sensors. When the sensors are microphones, they are usually moved away from the noise source, in such a way that they pick up acoustic pressure that is representative of the noise propagated far away. In the vicinity of the source, local acoustic phenomena can occur which are not propagated far away. A microphone located near the noise source would therefore pick up an acoustic pressure that did not necessarily represent the one that effectively exists far away, the area where we are in fact seeking to reduce the noise. These phenomena, frequently grouped under the term "nearfield" tend to decrease the performance of the control system, owing to the fact that the controller seeks in this case to reduce an acoustic pressure which is not representative of the noise to be reduced. In practice, the significant amount of wiring required as a result of the positioning of the microphones in the far field, has a non-negligible effect on the cost of the system. The possibility of bringing the sensors closer to the source is sufficiently advantageous to envisage to studying the feasibility and the resulting consequences. The present approach consists in representing the primary field radiated by a vibrating structure in terms of a set of "acoustic radiation modes". The use of radiation modes to characterise the behaviour of a structure has received increasing attention since the beginning of 90's, especially for active control applications. Usually, this approach consists in representing the radiated power in terms of a set of surface velocity distributions, called radiation modes, that have the property of radiating independently of each other. Radiation modes result from diagonalization of a discrete expression of the radiation operator. We propose here to study the consequences on the radiation modes of a structure from bringing the microphones closer to it. We will study how the acoustic field varies with the distance and how this can be used to obtain a model, the complexity of which is adapted to the observation distance.