Looking at a spectral plot of the clip on YouTube, Kevin Fu, a computer scientist at the University of Michigan, noted some unusual ripples. He thought he might know what they meant.
Fu’s lab specializes in analyzing the cybersecurity of devices connected to the Internet of Things, such as sensors, pacemakers, RFIDs, and autonomous vehicles. That work has taught him that modern electronics often behave in unpredictable ways and that such devices can be manipulated—intentionally or inadvertently—using carefully crafted acoustic or radio interference. To Fu, the ripples in the spectral readout suggested some kind of interference.
He discussed the AP clip with his frequent collaborator, Wenyuan Xu, a professor at Zhejiang University, in Hangzhou, China, and her Ph.D. student Chen Yan. “We saw it as an interesting puzzle,” says Xu, whose lab works on embedded security, including the use of ultrasound and radio waves to fool voice-recognition systems and self-driving cars. “It was a lot of fun to try to solve it.”
“I thought it might be subharmonics,” Fu recalls. “But a week later, Chen said, ‘No, Kevin, you’re wrong, and I just did an experiment to prove it.’ ”
Yan and Xu started with a fast Fourier transform of the AP audio, which revealed the signal’s exact frequencies and amplitudes. Then, through a series of simulations, Yan showed that an effect known as intermodulation distortion could have produced the AP sound. Intermodulation distortion occurs when two signals having different frequencies combine to produce synthetic signals at the difference, sum, or multiples of the original frequencies.
When signal processing equipment behaves in a nonlinear way, it can cause this type of distortion. For example, Fu says, microphone circuitry can exhibit nonlinear behavior, and waves propagating through air can also behave in a nonlinear fashion. “As acoustic waves containing multiple frequencies travel through a nonlinear system, you can get these bizarre ripples in the spectrum of the signal,” he explains. “At the same time, intermodulation distortion can produce lower-frequency signals than the original signals. In other words, inaudible ultrasonic waves going through air can produce audible by-products.”
Yan followed up the simulations with lab experiments, in which he used two ultrasonic speakers, one emitting a signal at 25 kHz and the other at 32 kHz. When he crossed the two signals, it produced the telltale high-pitched sound at 7 kHz, which was equal to the difference between the two speakers’ frequencies—and the same frequency as in the AP audio. In a nod to the Internet meme “rickrolling,” Yan was even able to embed an ultrasonic version of the Rick Astley song “Never Gonna Give You Up,” which became audible at the point where the two signals crossed.
Having reverse engineered the AP audio, Fu, Xu, and Yan then considered what combination of things might have caused the sound at the U.S. embassy in Cuba. “If ultrasound is to blame, then a likely cause was two ultrasonic signals that accidentally interfered with each other, creating an audible side effect,” Fu says. There are existing sources of ultrasound in office environments, such as room-occupancy sensors [see, for example, “How an Ultrasonic Sensor Nearly Derailed a Ph.D. Thesis”]. “Maybe there was also an ultrasonic jammer in the room and an ultrasonic transmitter,” he suggests. “Each device might have been placed there by a different party, completely unaware of the other.”