Weakly-supervised Localization of Multiple Objects in Images using Cosine Loss

Björn Barz, Joachim Denzler

2022

Abstract

Can we learn to localize objects in images from just image-level class labels? Previous research has shown that this ability can be added to convolutional neural networks (CNNs) trained for image classification post hoc without additional cost or effort using so-called class activation maps (CAMs). However, while CAMs can localize a particular known class in the image quite accurately, they cannot detect and localize instances of multiple different classes in a single image. This limitation is a consequence of the missing comparability of prediction scores between classes, which results from training with the cross-entropy loss after a softmax activation. We find that CNNs trained with the cosine loss instead of cross-entropy do not exhibit this limitation and propose a variation of CAMs termed Dense Class Maps (DCMs) that fuse predictions for multiple classes into a coarse semantic segmentation of the scene. Even though the network has only been trained for single-label classification at the image level, DCMs allow for detecting the presence of multiple objects in an image and locating them. Our approach outperforms CAMs on the MS COCO object detection dataset by a relative increase of 27% in mean average precision.

Download


Paper Citation


in Harvard Style

Barz B. and Denzler J. (2022). Weakly-supervised Localization of Multiple Objects in Images using Cosine Loss. In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 5: VISAPP; ISBN 978-989-758-555-5, SciTePress, pages 287-296. DOI: 10.5220/0010760800003124


in Bibtex Style

@conference{visapp22,
author={Björn Barz and Joachim Denzler},
title={Weakly-supervised Localization of Multiple Objects in Images using Cosine Loss},
booktitle={Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 5: VISAPP},
year={2022},
pages={287-296},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010760800003124},
isbn={978-989-758-555-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 5: VISAPP
TI - Weakly-supervised Localization of Multiple Objects in Images using Cosine Loss
SN - 978-989-758-555-5
AU - Barz B.
AU - Denzler J.
PY - 2022
SP - 287
EP - 296
DO - 10.5220/0010760800003124
PB - SciTePress