Machine-learning algorithms can analyze thousands of hours of real soundscapes in a matter of seconds.
We will learn a lot about nature if we pay more attention to it, and scientists worldwide are doing precisely that. Biologists are gradually placing audio recording devices on mountain tops and ocean depths to listen in on the groans, shrieks, whistles, and songs of whales, elephants, bats, and, particularly, birds.
More than 2,000 electronic ears, for example, can capture the soundscape of California’s Sierra Nevada mountain ranges this summer, resulting in nearly a million hours of audio. Developers are using artificial intelligence to prevent wasting several human lifespans decoding it. These videos will provide valuable snapshots of animal populations, allowing conservationists to see how strategies and management methods impact an entire ecosystem in great detail.
Gathering information on the number of organisms and individuals in a given area is just the start. The soundscape of the Sierra Nevada provides critical information about how last year’s historic wildfires impacted birds living in various ecosystems and ecological environments in the region. The recordings can show how different animal populations coped with the disaster and which conservation efforts may help animals recover more quickly. Information on relationships between people in larger communities may also be captured with these videos. How do mates find each other in a sea of consorts, for example? The sound scientists may also use sound to detect changes in migration timing or population ranges.
Sound-based programs are ongoing to count insects, analyze the impact of light and noise emissions on avian populations, monitor endangered species, and cause warnings when recorders hear noise from illicit hunting or logging operations, among other things. “Audio data is a true treasure chest because it holds huge volumes of information,” says Sierra Nevada project leader ecologist Connor Wood, a Cornell University, postdoctoral researcher. “All we have to do now is think about new ways to share and access [that information].” This is a serious issue because extracting valuable information from recordings takes a long time for humans. Fortunately, the newest generation of machine-learning AI systems will process thousands of hours of data in less than a day, allowing them to classify animal species based on their calls.
Laurel Symes, assistant director of the Cornell Lab of Ornithology’s Center for Protection Bioacoustics, says, “Artificial intelligence has been the major game changer for us.” She researches auditory contact in animals such as crickets, frogs, bats, and birds. She has accumulated months of recordings of katydids (famous vocal long-horned grasshoppers that are an essential part of the food web) in the Panamanian rain forests.
This audio contains patterns of breeding behavior and seasonal population variation, but decoding it takes a long time: Symes and three colleagues spent 600 hours classifying different katydid species from only ten captured hours of sound. However, a machine-learning algorithm named KatydID that her team is creating completed the same challenge as its human developers “went out for a beer,” according to Symes.
KatydID, for example, is a self-learning system that uses a neural network. Stefan Kahl, a machine-learning researcher at Cornell’s Center for Conservation Bioacoustics and the Chemnitz University of Technology in Germany, describes it as “a really, really close estimate of the human brain.” BirdNET, one of the most widely deployed avian-sound-recognition systems today, was created by him. Wood’s team will use BirdNET to study the Sierra Nevada recordings.
It is already being used by other researchers in France’s Brière Regional Natural Park to track the influence of light and noise emissions on the dawn chorus. Such systems begin by processing many inputs, such as hundreds of registered bird calls, each of which is “labeled” with the species to which it belongs. The neural network then taught itself the features to link an input (a bird’s call in this case) to a mark (the bird’s identity). Humans have no idea what the majority of the functions are, and there are millions of them. Detection programs used to be semi-automatic in older models.
To distinguish a bird by singing, they searched spectrograms (visual representations of audio signals) for proven features such as frequency range and length. For certain animals, this is effective. For example, the northern cardinal’s song often starts with a few extended notes that increase pitch, followed by fast, quick notes with a noticeable pitch drop. It can be conveniently distinguished from a spectrogram, just as a written song can be distinguished from sheet music.
On the other hand, other avian calls are more complicated and diverse, which may throw older systems off. “To classify the organisms, you need a lot more than signatures,” Kahl notes. Many birds have several songs, and they, like most mammals, have regional “dialects.” The tone of a white-crowned sparrow from Washington State differs significantly from that of its Californian counterparts. Machine-learning programs can recognize these nuances “Let’s pretend there’s an unreleased Beatles album that’s out today. You haven’t heard the music or the lyrics before, so you know it’s a Beatles song because it sounds like them,” Kahl says.
“This is what these programs teach you,” says the author. Recent advancements in human speech and music processing technologies have benefited these programs. Experts at New York University’s Music and Audio Research Laboratory focused on their musical expertise to create BirdVox, a bird-call recognition device, in partnership with Andrew Farnsworth of the Cornell Lab of Ornithology.
It senses and separates birdsong from background sounds such as frog and insect calls, human land and air transportation, and sources such as wind and rain, all of which can be unexpectedly noisy and variable. The quantity of prelabeled recordings available determines how much each machine knows.
For popular birds, there is already a wealth of information. According to Kahl, there are about 4.2 million recordings online for 10,000 animals. However, the majority of the 3,000-plus species identified by BirdNET are found in Europe and North America, and BirdVox focuses on the songs of U.S.-based birds. “In other areas, [BirdNET] doesn’t fit as well with rarer species or those that don’t have very well data,” says India-based ecologist V. V. Robin.
He’s on the lookout for the endangered species Jerdon’s courser, a nocturnal species that hasn’t been seen in over a decade. Robin and his partners have installed recorders in a southern Indian protected area to try to catch the call. Since 2009, he has been capturing birds in the Western Ghats, a biological diversity hotspot also in southern India. These recordings have been painstakingly annotated in order to train machine-learning algorithms that have been created locally.
Citizen scientists may also contribute to the birdsong archive by filling in the holes. BirdNET is the brains behind a famous mobile app for amateur birders. They record audio clips and upload them to the app, identifying the singer’s species and saving the recording in the researchers’ archive. According to Kahl, and over 300,000 recordings are sent in every day.
There is also space for development in these machine-learning algorithms. Despite the fact that they process audio much faster than us, they also fall short when it comes to sifting through multiple sounds to find a signal of interest. This, according to some scholars, is the next issue for AI to solve. Also, the latest flawed models allow large-scale projects that will take way too long for humans to complete on their own. “As ecologists, resources like BirdNET encourage us to think big,” Wood says.