Michael Calonder
A typical Computer Vision system needs to process vast amounts of data as captured by one or more cameras, constantly testing the capabilities of today's hardware. Yet such systems face an ever-growing computational load caused by the more and more demanding applications which is further amplified by the trend to run the software on computationally weaker devices such as mobile phones, for example. Moreover, it is imperative for most real-world systems, be it for object detection or mobile robotics, to run in real-time so as to serve the purpose. Although applications of such systems can vary greatly, most systems share the same outset in the processing pipeline: They establish sparse point matches across multiple views of rigid scenes, which is known as the interest point matching problem. Being a very central problem, a number of algorithms offer fairly accurate solutions. However, they force their superordinate application to spend most of the available CPU time on matching, leaving only little room for subsequent tasks in the pipeline. In this thesis we propose two families of algorithms, Signatures and BRIEF, that address these difficulties. Both of them exhibit comparable accuracy to existing methods and improve up to two orders of magnitude on overall speed and one order of magnitude on memory consumption. Signatures and especially BRIEF are conceptually of downright simplicity, offering a competitive and easy-to-implement option for building a solid base for real-time applications with a stringent time budget. We demonstrate the effectiveness by implementing Signatures and BRIEF in four real-time systems. Technically, Signatures employ Randomized Trees to quantify the similarity of two image patches around arbitrary interest points, resulting in a robust description vector. Thanks to the binary tests that the Trees use for indexing image patches into the leaves, they are very efficient to evaluate which directly carries over to Signatures. BRIEF is the ultimate reduction of the binary test idea and uses such tests directly on a smoothed version of the patch to describe it. Doing so can be seen as computing thresholded random gradients resulting in binary description vectors which require less memory than their state-of-the-art counterparts and are particularly efficient to match —especially on recent hardware.
EPFL2010