Efficient image duplicate detection based on image analysis

Yannick Maret
2007
EPFL thesis

Abstract

This thesis is about the detection of duplicated images. More precisely, the developed system is able to discriminate possibly modified copies of original images from other unrelated images. The proposed method is referred to as content-based since it relies only on content analysis techniques rather than using image tagging as done in watermarking. The proposed content-based duplicate detection system classifies a test image by associating it with a label that corresponds to one of the original known images. The classification is performed in four steps. In the first step, the test image is described by using global statistics about its content. In the second step, the most likely original images are efficiently selected using a spatial indexing technique called R-Tree. The third step consists in using binary detectors to estimate the probability that the test image is a duplicate of the original images selected in the second step. Indeed, each original image known to the system is associated with an adapted binary detector, based on a support vector classifier, that estimates the probability that a test image is one of its duplicate. Finally, the fourth and last step consists in choosing the most probable original by picking that with the highest estimated probability. Comparative experiments have shown that the proposed content-based image duplicate detector greatly outperforms detectors using the same image description but based on a simpler distance functions rather than using a classification algorithm. Additional experiments are carried out so as to compare the proposed system with existing state of the art methods. Accordingly, it also outperforms the perceptual distance function method, which uses similar statistics to describe the image. While the proposed method is slightly outperformed by the key points method, it is five to ten times less complex in terms of computational requirements. Finally, note that the nature of this thesis is essentially exploratory since it is one of the first attempts to apply machine learning techniques to the relatively recent field of content-based image duplicate detection.

Official source

https://infoscience.epfl.ch/record/103736?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Efficient image duplicate detection based on image analysis

Graph Chatbot

Chat with Graph Search

Aggregating Spatial and Photometric Context for Photometric Stereo

BigNeuron: a resource to benchmark and predict performance of algorithms for automated tracing of neurons in light microscopy datasets

Attribute Prediction as Multiple Instance Learning

Aggregating Spatial and Photometric Context for Photometric Stereo

Attribute Prediction as Multiple Instance Learning

BigNeuron: a resource to benchmark and predict performance of algorithms for automated tracing of neurons in light microscopy datasets