Try to imagine former federal PC leader Andrew Scheer dressed as Pee Wee Herman, delivering a public service announcement about the dangers of crack cocaine.
Now you don't have to because someone made a minute-long video of just that. Or rather, someone told a computer to. You can watch the video below.
This video is an example of a "deepfake," a piece of fake media created using deep learning, hence the name. Deep learning is a form of artificial intelligence that enables a machine to learn a particular task on its own. In the case of face-swapping deepfake videos, the task is to convincingly stitch one face onto another.
The technology draws its inspiration from how humans think and learn. Our brains cells, also known as neurons, communicate using chemicals and electrical signals along connections within a complex network. The pattern of communication in response to an input, like visual information from our eyes, determines how we make sense of that information. Learning occurs when repeated activation of certain patterns strengthens the connections between the neurons involved.
Deep learning doesn't operate exactly like this, but it borrows concepts that make learning possible in humans, like the communication between neurons, and simulates them mathematically. The result is artificial neural networks — sets of instructions that are organized in many interconnected layers. These networks look for features and patterns within information to learn a particular task. As they repeat their tasks over and over again, they learn by making small adjustments each time and determining whether those adjustments are improvements or not.
To swap two faces in a video, like Scheer's and Herman's, the neural network breaks the task into two key steps. It must learn to create simplified representations of faces from images, and to reconstruct proper images of faces from those representations. It trains itself by repeatedly practicing on many different images of the two faces. These training images are often large sets of photos or video frames. Once the network becomes proficient at simplifying and reconstructing images of both faces, it creates simplified representations of Herman's face from each frame of the original PSA video. It then applies its understanding of how to reconstruct images of Scheer's face to those representations. The final product is a video with Scheer's face, superimposed to Herman's body, appearing to say Herman's lines.
The Andrew Scheer-Pee Wee Herman video may look like an obvious fake, but it is just one of many deepfake videos that have emerged over the past few years, some more believable than others. The development of new techniques, such as mapping facial landmarks, like the eyes, nose, and chin, allows neural networks to create more realistic deepfakes in less time and with less training data.
By applying the same deep learning approach to sound, creating audio of someone saying things that they never actually said is also possible. Researchers from Stanford University, the Max Planck Institute for Informatics, Princeton University, and Adobe Research recently unveiled a new neural network that learns the basic features of sounds in speech and the associated mouth shapes. It can then manipulate videos to make it look and sound like a person said something completely different.
All it takes is telling the neural network what the new words should be. This technology could help the media industry clean up footage by adding new words that a speaker forgot, removing mistakes or filler words like "um", or rearranging sentences or sections seamlessly. It also opens up many possibilities for translations and voiceovers. A performer could appear to speak another language perfectly in their own voice without ever having to actually learn it.
Deepfakes can even bring people back from the dead. At the Salvador Dalí Museum in St. Petersburg, Florida, visitors can interact with deepfake footage of Salvador Dalí and learn about the art from the man himself. He'll even take selfies with them. This particular deepfake involved training a neural network to map Dalí's face over footage of a body double and the speech from a voice impersonator.
You can watch the video below.
Although deepfakes have a lot of interesting and positive applications, it's not hard to imagine the potential for misuse and harm. In fact, some of the earliest deepfakes were sexually explicit videos with faces of celebrities superimposed on another people’s bodies, and the same approach could be used to create compromising material for blackmail or bullying. Deepfakes also present a serious threat as a means for fraud, falsifying evidence, and impersonation.
The ability to make it appear as though someone said things that they never actually did also draws serious concern for politics and democracy, especially in light of increasing polarization and division. Deepfakes of politicians could incite violence, damage foreign relations and spread misinformation.
While deepfakes aren't perfect quite yet, they don't have to be to have an effect. Psychologists have repeatedly demonstrated that misinformation lingers in the mind even after it's disproven. We often use our sense of familiarity to judge if information is accurate and can misinterpret our familiarity with a piece of misinformation as an indicator that it's somewhat accurate. Researchers at the University of Warwick in the UK found that even very obviously faked images can have a small effect on belief.
The possibility that a piece of video or audio could be fake also creates plausible deniability. As deepfakes become more and more believable, denying things ever happened becomes more and more powerful.
With this looming threat, what can be done to protect society from the dangers of deepfakes?
Similar to how neural networks learn to create deepfakes, they can also learn to detect them. Developers of deepfake neutral networks could also introduce markers that make deepfakes easier to detect if misused by others. The tricky part about any technological solution is that it will drive an arms race between deepfake creation and detection. Over time, the deepfakes will become more and more sophisticated and harder to identify.
Researchers at the University of California Davis and the University of Virginia found that internet skills and experience with photo-editing software, like Photoshop, predicted how well people could tell if a still image was faked. Educating people on the existence of deepfake technology and how deepfakes are made could help them recognize fakes and supports them in being more critical about the media they see. Some red flags to look out for are unrealistic looking mouths and teeth, a lack of blinking, out-of-sync audio and video, and abnormalities in the colours or textures, particularly in the backgrounds. However, as deepfake technology advances, these may not be useful clues in the future.
Governments and social media platforms will need to change their policies to prevent the misuse of deepfakes and hold perpetrators accountable. Although the Canadian government doesn't have any laws specifically regarding deepfakes, it has begun to think about how existing laws regarding defamation, election tampering, and the criminal code, could apply to the making and sharing of harmful deepfakes.
Twitter recently completed a survey asking its users how they wanted Twitter to handle deepfake content, either to flag them as being fake or remove them entirely. At the start of this year, Facebook announced a ban on deepfake videos. However, they were clear that videos edited without the help of deep learning would remain on the platform.
In contrast to deepfakes, videos manipulated by traditional video editing are now often called "dumbfakes"," cheapfakes", or "shallowfakes". They may actually pose a more pressing threat, considering the large of amount of time and computing power that it takes to create somewhat convincing deepfakes at the moment. Dumbfakes, on the other hand, only require a little basic video editing to quickly produce and share misleading narratives.
For example, a video of US House Speaker Nancy Pelosi was slowed down to make her appear impaired. Closer to home, a shortened, out-of-context clip of Prime Minister Justin Trudeau at a G20 summit appeared to show him being ignored by Brazilian President Jair Bolsonaro when attempting to shake hands. In reality, Bolsonaro turned away to shake hands with another person before shaking hands with Trudeau.
Deepfakes have the potential to erode our sense of what is real and can be trusted. As we move towards a world where seeing may no longer be believing, we need to stay vigilant and remember to always think critically.
Tim Li is a science communication student at Laurentian University.