Tidzam: AI-based Wildlife detection, identification and geo-localization

Tidzam wildlife detection system monitors wildlife, leveraging 24 custom microphones and 6 cameras deployed across four different areas at Tidmarsh. Optical and acoustic sensing provide complementary information. For example, the biophony is intrinsically complex in terms of vocalizations (e.g. birds) and diverse regarding species candidates. However, the biophony is mainly produced by non-visible creatures. A multi-modal sensing approach can help separate noisy geophony and anthrophony from the desired wildlife signal. One crucial requirement is a system’s ability to detect new species in an area, especially in a dynamic restoration program such as Tidmarsh.

Bio-acoustic Classifiers


The Tidmarsh bio-acoustic ecosystem has evolved dramatically over years of restoration progress. Dynamic environments require continuous learning to make classifiers robust to both episodic and permanent acoustic changes – especially concerning the identification of as-yet unseen species. To that end, we developed a semi-automatic database augmentation mechanism. A flow controller limits the recording volume and parameterizes the extraction balance between unidentified and uncertain predictions. Our ’Tidplay’ platform, introduced below, allows bio-acoustic human experts to annotate and discuss these recordings while
building a local acoustic database used to iteratively refine the classifiers. The database is – at this time – composed of 400,000 500 ms recordings distributed ove 66 classes including system failure modes (e.g. microphone crackling or offline), geophonic scenes (e.g. rain, wind, quiet), anthrophonic sounds (e.g. cars, airplanes, human voices), and finally, bio-acoustic events from insects (e.g. crickets, cicadas), amphibians (e.g. spring peepers, green frogs), and bird vocalizations across 42 species.

Real-time bird detection with species identification.


Camera Trap Classifiers


Camera traps use movement detectors to trigger video recording. In an outdoor environment such as Tidmarsh, non-animal movements dominate the trigger. Common causes include rain, wind, and water flow, which together produce a large number of irrelevant video recordings. Deep Learning can provide a high level of visual semantic description, saving volunteer time. We have experimented
with and deployed different types of computer vision models to pre-filter our motion video databases. We use our Tidplay platform to
build a locally-dependent visual database to refine the pre-trained classifier model. This platform allows volunteers to create new classes and add new bounding boxes to video frames automatically extracted by a confidence function similar to the one used in our bio-acoustic classifier. Our current system is based on the Yolo v3 model and analyzes video recordings coming from 6 network cameras.


Tidplay Annotation Platform


Tidplay is an open-source, crowd-sourcing annotation web platform that we have designed to build training databases from audio and video sources. Users can upload, download and share audio and video files, write down annotations and comments, and create their custom databases while learning about wildlife. Tidplay has two intended user bases. First, wildlife ecologists can use Tidplay to share data for collaborating on the construction of annotated databases. Second, a tutorial mode can be used for public engagement
and student training. Users can learn how to distinguish different sounds coming from geophony, anthropophony and biophony, progressively developing their abilities to identify challenging bird calls, for example. The multiple training levels available allow users to extend their bio-acoustic skills by comparing their answers and discussing ambiguous recordings with other users ranging from novices
to experts. Recordings extracted automatically by Tidzam classifiers are integrated into Tidplay for cross-validation by multiple wildlife experts before being integrated into training databases. The Tidplay platform can be used for timestamped annotations of audio as shown below drawing video bounding boxes, and for pose estimation.


Publications

Clement Duhart, Gershon Dublon, Brian Mayton, et al. Deep Learning Locally Trained Wildlife Sensing in Real Acoustic Wetland Environment,” Advances in Signal Processing and Intelligent Recognition Systems. SIRS 2018. Communications in Computer and Information Science, Singapore, January 2019. DOI: https://doi.org/10.1007/978-981-13-5758-9_1.

Contributors

Clement Duhart
Researcher
Brian Mayton
Researcher
Gershon Dublon
Researcher
Spencer Russell
Researcher
Joe Paradiso
Researcher

Institutions

MIT Media Lab