AI’s Latest Misuse: Principal’s Voice Deepfake Emerges as Concerning Incident

The latest criminal incident linked to artificial intelligence surfaced from a high school in Baltimore County, Maryland.

Last week, a Maryland high school became the center of attention in the most recent criminal case involving artificial intelligence, as police reported an incident where a principal was falsely depicted as racist through a fabricated recording of his voice.

Experts emphasize that this case underscores the importance for everyone, not just public figures like politicians and celebrities, to recognize the potential dangers of increasingly sophisticated deep-fake technology.

Hany Farid, a professor specializing in digital forensics and misinformation at the University of California, Berkeley, highlighted the widespread vulnerability to such attacks, stressing that perpetrators could come from any background.

Now, let’s delve into some of the recent instances where AI has been utilized for nefarious purposes.

The accessibility of AI has significantly increased in recent times. While the manipulation of recorded sounds and images is not a new concept, the ease with which individuals can now alter information is a relatively recent development. Furthermore, the rapid dissemination of manipulated content on social media platforms has added to its impact.

The recent incident involving the fake audio clip impersonating the principal illustrates the capabilities of a subset of artificial intelligence known as generative AI. This technology can produce highly realistic new images, videos, and audio clips. Moreover, it has become more affordable and user-friendly in recent years, making it accessible to anyone with an internet connection.

According to Farid, the Berkeley professor, the past year has seen a notable increase in the accessibility of generative AI. Individuals can now access online services where, either for free or for a nominal fee, they can upload short audio clips, typically around 30 seconds in length.

These clips can be sourced from various sources such as voicemails, social media posts, or covert recordings. Machine learning algorithms then analyze and replicate the individual’s speech patterns, allowing for the generation of cloned speech based on text input.

Farid predicts that the advancement of technology will lead to even more powerful and user-friendly tools, including those for manipulating videos.

In the case of the incident in Maryland, authorities revealed that Dazhon Darien, the athletic director at Pikesville High, was responsible for cloning Principal Eric Eiswert’s voice. The fabricated recording contained racist and antisemitic remarks, according to the police. Initially disseminated via email to some teachers, the audio file quickly spread across social media platforms.

The emergence of this recording coincided with Eiswert expressing concerns about Darien’s job performance and alleged misappropriation of school funds, as stated by the police.

As a result of the fake audio, Eiswert was placed on leave, with police providing security at his residence. The school faced a barrage of angry phone calls, while social media platforms were inundated with hateful messages.

Upon investigation, detectives enlisted the help of external experts to analyze the recording. According to court records, one expert noted that the recording “contained traces of AI-generated content with human editing after the fact.” A second analysis by Farid, the Berkeley professor, revealed that “multiple recordings were spliced together,” as per the records.

Farid clarified to The Associated Press that there are still uncertainties regarding the exact method used to create the recording, and he has not definitively confirmed it as entirely AI-generated. Nonetheless, he emphasized that the case in Maryland serves as a warning signal regarding the imperative to implement better regulations for this technology in light of its advancing capabilities.

The prevalence of audio-based disinformation is a significant concern in the realm of AI. This is largely due to the rapid progress of technology, which has outpaced the ability of human ears to detect signs of manipulation. In contrast, discrepancies in videos and images are often more readily noticeable.

Instances of individuals cloning voices to deceive others for financial gain have been reported. For example, some have impersonated kidnapped children over the phone to extort ransom money from their parents, while others have posed as company executives in urgent need of funds.

During this year’s New Hampshire primary, AI-generated robocalls mimicked President Joe Biden’s voice in an attempt to dissuade Democratic voters from participating. This incident underscores concerns raised by experts regarding the potential surge in AI-generated disinformation targeting elections this year.

WHAT CAN BE DONE?

However, the troubling trends extend beyond audio manipulation. Experts warn of the existence of programs capable of generating fake nude images of clothed individuals without their consent, including minors. Recently, singer Taylor Swift was reportedly a victim of such targeting.

Addressing these challenges requires concerted efforts. While most providers of AI voice-generating technology claim to prohibit harmful usage, enforcement of these policies varies. Some vendors implement measures such as voice signatures or unique recitation requirements to deter misuse.

Larger tech companies like Meta (formerly Facebook) and OpenAI, the developer of ChatGPT, restrict access to their technology to a select group of trusted users due to the risks involved.

Farid advocates for more robust measures, such as mandating users to provide phone numbers and credit cards for traceability purposes, enabling accountability for misuse. Another proposed solution is the implementation of digital watermarks on recordings and images to facilitate tracking and authentication.