detected with IoU greater than 0.3. With IoU above
0.3, the task of locating animals becomes very easy in
extremely low light and low contrast images.
In this work, we address the empty frame removal
problem and the animal detection challenge in camera
trap sequences. In tandem, we investigate the applica-
bility of ViT, DETR, and Faster R-CNN for this task.
Our experiments reaffirm the generalisation gap in the
context of unseen test data. We culminate our experi-
mental study with proposal of a two-stage pipeline for
mining vital statistics from camera trap sequences. In
the first stage we filter out empty frames and in the
second stage, we perform wildlife detection and local-
isation. Balancing the trade-off between retaining all
frames containing animals and filtering out all empty
frames we adopt ViT(best model on ‘cis’) for remov-
ing empty frames and DETR for detecting animals.
Despite heavy background clutter, camouflage, size
and pose variations, occlusion, progressive illumina-
tion changes from day to night, and seasonal varia-
tions in flora and fauna in camera trap data we ob-
tain a competitive accuracy. We shall extend our work
to make the empty frame removal and animal detec-
tion pipeline even more robust, especially under ex-
treme low-light and low-contrast conditions. Hence,
develop practically deployable wildlife detection sys-
tems. Further, we plan to incorporate open set recog-
nition, zero-shot learning, and few-shot learning for
generalising to unseen locations.
This work is partially supported by National
Mission for Himalayan Studies (NMHS) grant
