Seeing Is Believing

Say What? A crudely manipulated video of Nancy Pelosi that seemed to show her slurring her words went viral in May, egged on in part by President Trump.

Deep Dive Andrew Grotto, a research fellow at Stanford University’s Hoover Institute and Center for International Security and Cooperation, is studying how deepfakes impact the electoral process and messaging.

The future of misinformation is here. It reared its ugly head in May in the form of a doctored video of House Speaker Nancy Pelosi—manipulated to show her slurring her words, as if she were drunk. The trick was simple; the footage of Pelosi, speaking at a conference on May 22, was merely slowed down 25 percent. In the world of video editing, it’s child’s play.

The video went viral shortly after Pelosi said that Donald Trump’s family should stage an intervention with the president “for the good of the country.” The faked video surfaced on Facebook, where it was viewed more than 2 million times within a few hours. It was also shared by Trump lawyer and apologist Rudy Guiliani with a caption (since deleted) that read: “omg, is she drunk or having a stroke?” followed by “She’s drunk!!!”

The incident called to mind an even cruder video dust-up in 2018 involving footage of CNN reporter Jim Acosta, manipulated to give the impression that he had behaved aggressively against a White House intern at a press conference.
The deceptive clip was actually released by press secretary Sarah Huckabee Sanders.

The country’s most powerful people lending their authority to objectively bogus video as a political weapon is enraging enough. But compared to what’s coming over the digital media horizon, the Acosta and Pelosi videos will soon look and feel as antique as a Buster Keaton short alongside Avengers: Endgame.

Cue Bachman-Turner Overdrive’s “You Ain’t Seen Nothin’ Yet.” Welcome to the Age of Deepfakes.

The term “deepfakes” is a portmanteau, a reference to artificial intelligence-assisted machine learning, a.k.a. “deep learning.” It’s an emerging technology that can potentially put the kind of highly realistic video and audio manipulation once only accessible to Hollywood in the hands of state intelligence agencies, corporations, hackers, pornographers or any 14-year-old with a decent laptop and a taste for trolling. In its most obvious application, a deepfake can create an utterly convincing video of any celebrity, politician or even any regular citizen doing or saying something that they never said or did. (For the record, the Pelosi video is not technically a deepfake; it is to deepfakes what a stick figure drawing would be to a high Renaissance painting).

The buzz about deepfakes has penetrated nearly every realm of the broader culture—media, academia, tech, national security, entertainment—and it’s not difficult to understand why. In the constant push-pull struggle between truth and lies, already a confounding problem of the Internet Age, deepfakes represent that point in the superhero movie when the cackling bad guy reveals his doomsday weapon to the thunderstruck masses.

“If 9/11 is a 10,” says former White House cybersecurity director Andrew Grotto, “and let’s say the Target Breach (a 2013 data breach at the retailer that affected 40 million credit card customers) is a 1, I would put this at about a 6 or 7.”

Deepfake videos present a fundamentally false version of real life. It’s a deception powerful enough to pass the human mind’s Turing test—a lie on steroids.

In many cases, it’s done for entertainment value and we’re all in on the joke. In Weird Al Yankovic’s face-swap masterpiece, “Perform This Way”—a parody of Lady Gaga’s “Born This Way”—nobody actually believes that Weird Al has the body of a female supermodel. No historian has to debunk the idea that Forrest Gump once met President John F. Kennedy.

But the technology has now advanced to the point where it can potentially be weaponized to inflict lasting damage on individuals, groups, and even economic and political systems. For generations, video and audio have enjoyed almost absolute credibility. Those days are coming to an abrupt and disorienting end. Whether it’s putting scandalous words into the mouth of a politician or creating a phony emergency or crisis just to sow chaos, the day is fast approaching when deepfakes could be used for exploitation, extortion, malicious attack or even terrorism.

Of course, creating fake videos that destroy another person’s reputation, whether it’s to exact revenge or ransom, is only the most individualized and small-scale nightmare of deepfakes. If you can destroy one person, why not whole groups or categories of people? Think of the effect of a convincing but completely fake video of an American soldier burning a Koran, or a cop choking an unarmed protester, or an undocumented immigrant killing an American citizen at the border. Real violence could follow fake violence. Think of a deepfake video that could cripple the financial markets, undermine the credibility of a free election, or impel an impetuous and ill-informed president to reach for the nuclear football.

Why now?

Ultimately, the story of deepfakes is a story of technology reaching a particular threshold. At least since the dawn of television, generations have grown up developing deeply sophisticated skill sets in interpreting audiovisual imagery. When you spend a lifetime looking at visual information on a screen, you get good at “reading” it, much like a lion “reads” the African savanna.

Discerning the real from the phony isn’t merely a vestige of the video age. It was a challenge even when the dominant media platform wasn’t the screen but the printed word. Psychologist Stephen Greenspan, author of the book Annals of Gullibility, says that the tensions between credulity and skepticism have been baked into the American experience from the very beginning.

“The first act of public education was in the Massachusetts Bay Colony, long before the country even existed,” said Greenspan whose new book Anatomy of Foolishness is due out in August. “The purpose of that act was to arm children against the blandishments and temptations of Satan. It was even called ‘The Old Deluder Act.'”

The advent of still photography, movies, television and digital media each in turn added a scary new dimension to the brain’s struggle to tell true from false. At one point, video technology was able to create realistic imagery out of whole cloth, but it quickly ran into a problem known as the “uncanny valley effect,” in which the closer technology got to reality, the more dissonant small differences would appear to a sophisticated viewer. Deepfakes, as they now exist, are still dealing with that specific problem, but the fear is that they will soon transcend the uncanny valley and be able to produce fake videos that are indistinguishable from reality.

“It would be a disaster,” Greenspan says of the specter of deepfakes, “especially if it’s used by unscrupulous political types. It’s definitely scary because it exploits our built-in tendencies toward gullibility.”

How they work

Deepfakes are the product of machine learning and artificial intelligence. The applications that create them work from dueling sets of algorithms known as generative adversarial networks, or GANS. Working from a giant database of video and still images, this technology pits two algorithms—one known as the “generator” and the other the “discriminator”—against each other.

Imagine two rival football coaches, or chess masters, developing increasingly complicated and sophisticated offensive and defensive schemes to answer each other. The GANS process works when the generator and discriminator learn from each other, creating a kind of technological “natural selection.” This evolutionary dynamic accelerates the means by which the algorithm can fool the human eye and ear.

Naturally, the entertainment industry has been on the forefront of this technology, and the current obsession with deepfakes might have begun with the release in December 2016 of Rogue One, the Star Wars spin-off that featured a CGI-created image of the late Carrie Fisher as a young Princess Leia. A year later, an anonymous Reddit user posted some deepfakes celebrity porn videos with a tool he created called FakeApp. Shortly after that, tech reporter Samantha Cole wrote a piece for Vice’s Motherboard blog on the phenomenon headlined “AI-assisted Fake Porn is Here and We’re all Fucked.” A couple of months later, comedian and filmmaker Jordan Peele created a video in which he put words in the mouth of former President Obama as a way to illustrate the incipient dangers of deepfakes. Reddit banned subreddits having to do with fake celebrity porn, and other platforms, including PornHub and Twitter, banned deepfakes as well. Since then, everyone from PBS to Samantha Bee has dutifully taken a turn in ringing the alarm bells to warn consumers (and, probably, to inspire mischief-makers).

The deepfakes panic had begun.

Freak Out?

Twenty years ago, the media universe—a Facebook-less, Twitter-less, YouTube-less media universe, we should add—bought into a tech-inspired doomsday narrative known as “Y2K,” which posited that the world’s computer systems would seize up, or otherwise go haywire in a number of unforeseen ways, the minute the clock turned over to Jan. 1, 2000. Y2K turned out to be a giant nothing-burger and now it’s merely a punchline for comically wrongheaded fears.

In this case, Y2K is worth remembering as an illustration of what can happen when the media pile on to a tech-apocalypse narrative. The echoing effects can overestimate a perceived threat and even create a monsters-under-the-bed problem. In the case of deepfakes, the media freak-out might also draw attention away from a more nuanced approach to a coming problem.

Andrew Grotto is a research fellow at Stanford’s Hoover Institute and a research scholar at the Center for International Security and Cooperation, also at Stanford. Before that, he served as the senior director for cybersecurity policy at the White House in the Obama and Trump administrations. Grotto’s interest in deepfakes is in how they will affect the electoral process and political messaging.

Grotto has been to Capitol Hill and to Sacramento to talk to federal and state lawmakers about the threats posed by deepfakes. Most of the legislators he talked to had never heard of deepfakes and were alarmed at what it meant for their electoral prospects.

“I told them, ‘Do you want to live and operate in a world where your opponents can literally put words in your mouth?’ And I argued that they as candidates and leaders of their parties ought to be thinking about whether there’s some common interest to develop some kind of norm of restraint.”

Grotto couches his hope that deepfakes will not have a large influence on electoral politics in the language of the Cold War. “There’s almost a mutually-assured-destruction logic to this,” he says, applying a term used to explain why the U.S. and the Soviet Union didn’t start a nuclear war against each other. In other words, neither side will use such a powerful political weapon because they’ll be petrified it will then be used against them.

One of the politicians that Grotto impressed in Sacramento was Democrat Marc Berman, who represents California’s 24th District in the state assembly. Berman chairs the Assembly’s Elections and Redistricting Committee, and he’s authored a bill that would criminalize the creation or the distribution of any video or audio recording that is “likely to deceive any person who views the recording” or that is likely to “defame, slander or embarrass the subject of the recording.” The new law would create exceptions for satire, parody or anything that is clearly labeled as fake. The bill (AB 602) is set to leave the judiciary committee and reach the Assembly floor this month.

“I tell you, people have brought up First Amendment concerns,” Berman says over the phone. “It’s been 11 years since I graduated law school, but I don’t recall freedom of speech meaning you are free to put your speech in my mouth.”

The Electronic Frontier Foundation, which for almost three decades has fought government regulation in the name of internet civil liberties, is pushing back against any legislative efforts to deal with deepfakes. In a media statement, the EFF conceded that deepfakes could create mischief and chaos, but contended that existing laws pertaining to extortion, harassment and defamation are up to the task of protecting people from the worst effects.

Berman, however, is having none of that argument: “Rather than being reactive, like during the 2016 [presidential] campaign when nefarious actors did a lot of bad things using social media that we didn’t anticipate—and only now are we reacting to it—let’s try to anticipate what they’re going to do and get ahead of it.”

Good & Evil

Are there potentially positive uses for deepfake technology? In the United States of Entertainment, the horizons are boundless, not only for all future Weird Al videos and Star Wars sequels, but for entirely new genres of art yet to be born. Who could doubt that Hollywood’s CGI revolution will continue to evolve in dazzling new directions? Maybe there’s another Marlon Brando movie or Prince video in our collective future.

The Electronic Frontier Foundation touts something called “consensual vanity or novelty pornography.” Deepfakes might allow people to change their physical appearances online as way of identity protection. There could be therapeutic benefits for survivors of sexual abuse or PTSD to have video conferencing therapy without showing their faces.

Stanford’s Grotto envisions a kind of “benign deception” application that would allow a campaigning politician to essentially be in more than one place at a time, as well as benefits in get-out-the-vote campaigns.

But here at the top of the roller coaster, the potential downsides look much more vivid and prominent than any speculative positive effect. Deepfakes could add a wrinkle of complication into a variety of legitimate pursuits. For example, in the realm of journalism, imagine how the need to verify some piece of video or audio could slow down or stymie a big investigation. Think of what deepfakes could do on the dating scene, in which online dating is already consumed with all levels of fakeness. Do video games, virtual reality apps and other online participatory worlds need to be any more beguiling? Put me in a virtual cocktail party with my favorite artists and celebrities, and I’ll be ready to hook up the catheter and the IV drip to stay in that world for as long as possible.

If the Internet Age has taught us anything, it’s that trolls are inevitable, even indomitable. The last two decades have given us a dispiriting range of scourges, from Alex Jones to revenge porn. Trolling has even proven to be a winning strategy to win the White House.

“Let’s keep walking down the malign path here,” said former White House cybersecurity chief Grotto from his Stanford office, speculating on how deep the wormhole could go. Grotto brings up the specter of what he calls “deepfake for text.” He says it’s inevitable that soon there will be AI-powered chatbots programmed to rile up, radicalize and recruit humans to extremist causes.

What now?

In addressing the threat of deepfakes, most security experts and technologists agree that there is no vaccine. Watermarking technology could be inserted into the metadata of audio and video material. Even in the absence of legislation, app stores would probably require such watermarking be included on any deepfake app. But how long would it be before someone figured out a way to fake the watermark? There’s some speculation that celebrities and politicians might opt for 24/7 “lifelogging,” digital auto-surveillance of their every move
to give them an alibi against any
fake video.

Deepfakes are still in the crude stages of development. “It’s still hard to make it work,” Grotto says. “The tools aren’t to the point where someone can just sit down without a ton of experience and make something” convincing.

He said the 2020 presidential election may be plagued by many things, but deepfakes probably won’t be one of them. After that, though? “By 2022, 2024, that’s when the tools get better. That’s when the barriers to entry really start to drop.”

This moment, he says, isn’t a time to panic. It’s a time to develop policies and norms to contain the worst excesses of the technology, all while we’re still at the top of the roller coaster. Grotto says convincing politicians and their parties to resist the technology, developing legal and voluntary measures for platforms and developers, and labeling and enforcing rules will all have positive effects in slowing down the slide into deepfake hell.

“I think we have a few years to get our heads around it and decide what kind of world we want to live in, and what the right set of policy interventions look like,” he says. “But talk to me in five years, and maybe my hair will be on fire.”