Audio deepfake

An audio deepfake (also known as voice cloning) is a type of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices (due to throat disease or other medical problems) to get them back. Commercially, it has opened the door to several opportunities. This technology can also create more personalized digital assistants and natural-sounding text-to-speech as well as speech translation services. Audio deepfakes, recently called audio manipulations, are becoming widely accessible using simple mobile devices or personal computers. These tools have also been used to spread misinformation using audio. This has led to cybersecurity concerns among the global public about the side effects of using audio deepfakes, including its possible role in disseminating misinformation and disinformation in audio-based social media platforms. People can use them as a logical access voice spoofing technique, where they can be used to manipulate public opinion for propaganda, defamation, or terrorism. Vast amounts of voice recordings are daily transmitted over the Internet, and spoofing detection is challenging. Audio deepfake attackers have targeted individuals and organizations, including politicians and governments. In early 2020, some scammers used artificial intelligence-based software to impersonate the voice of a CEO to authorize a money transfer of about $35 million through a phone call. According to a 2023 global McAfee survey, one person in ten reported having been targeted by an AI voice cloning scam; 77% of these targets reported losing money to the scam. Audio deepfakes could also pose a danger to voice ID systems currently deployed to financial consumers. Audio deepfakes can be divided into three different categories: Replay-based deepfakes are malicious works that aim to reproduce a recording of the interlocutor's voice.

Graph Chatbot

Assessment framework for deepfake detection in real-world situations

Efficient Temporally-Aware DeepFake Detection using H.264 Motion Vectors

Improving Deepfake Detectors against Real-world Perturbations with Amplitude-Phase Switch Augmentation

Assessment framework for deepfake detection in real-world situations

Efficient Temporally-Aware DeepFake Detection using H.264 Motion Vectors

Improving Deepfake Detectors against Real-world Perturbations with Amplitude-Phase Switch Augmentation