Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
We used to say “seeing is believing": this is no longer true. The digitization is changing all aspects of life and business. One of the more noticeable impacts is in how business documents are being authored, exchanged and processed. Many documents such as passports and IDs are being at first created in paper form but are immediately scanned, digitized, and further processed in electronic form. Widely available photo editing software makes image manipulation quite literally a child's play increasing the number of forged contents tremendously. With the growing concerns over authenticity and integrity of scanned and image-based documents such as passports and IDs, it is more than urgent to be able to quickly validate scanned and photographic documents. The same machine learning that is behind some of the most successful content manipulation solutions can also be used as a counter measure to detect them. In this paper, we describe an efficient recaptured digital document detection based on machine learning. The core of the system is composed of a binary classification approach based on support vector machine (SVM), properly trained with authentic and recaptured digital passports. The detector informs when it encounters a digital document that is the result of photographic capture of another digital document displayed on an LCD monitor. To assess the proposed detector, a specific dataset of authentic and recaptured passports with a number of different cameras was created. Several experiments were set up to assess the overall performance of the detector as well as its efficacy for special situations, such as when the machine learning engine is trained on a specific type of camera or when it encounters a new type of camera for which it was not trained. Results show that the performance of the detector remains above 90 percent accuracy for the large majority of cases.
Christian Leinenbach, Sergey Shevchik, Rafal Wróbel, Marc Leparoux