9525–9536, Red Hook, NY, USA. Curran Associates
Inc.
Chan, L., Hosseini, M. S., and Plataniotis, K. N. (2021).
A comprehensive analysis of weakly-supervised se-
mantic segmentation in different image domains.
International Journal of Computer Vision (IJCV),
129(2):361–384.
Chattopadhay, A., Sarkar, A., Howlader, P., and Balasub-
ramanian, V. N. (2018). Grad-CAM++: Generalized
gradient-based visual explanations for deep convolu-
tional networks. In IEEE Winter Conference on Appli-
cations of Computer Vision (WACV), pages 839–847.
IEEE.
Dhillon, A. and Verma, G. K. (2020). Convolutional neu-
ral network: a review of models, methodologies and
applications to object detection. Progress in Artificial
Intelligence, 9(2):85–112.
Everingham, M., Van Gool, L., Williams, C. K., Winn,
J., and Zisserman, A. (2010). The pascal visual ob-
ject classes (voc) challenge. International Journal of
Computer Vision (IJCV), 88(2):303–338.
Huff, D. T., Weisman, A. J., and Jeraj, R. (2021). Interpre-
tation and visualization techniques for deep learning
models in medical imaging. Physics in Medicine &
Biology, 66(4):04TR01.
Lee, J. R., Kim, S., Park, I., Eo, T., and Hwang, D. (2021).
Relevance-CAM: Your model already knows where
to look. In IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition (CVPR), pages 14944–
14953.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Doll
´
ar, P., and Zitnick, C. L. (2014).
Microsoft coco: Common objects in context. In Fleet,
D., Pajdla, T., Schiele, B., and Tuytelaars, T., editors,
European Conference on Computer Vision (ECCV),
pages 740–755, Cham. Springer International Pub-
lishing.
Minaee, S., Boykov, Y. Y., Porikli, F., Plaza, A. J., Kehtar-
navaz, N., and Terzopoulos, D. (2021). Image seg-
mentation using deep learning: A survey. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence
(PAMI), pages 1–1.
Pons, J., Slizovskaia, O., Gong, R., G
´
omez, E., and Serra,
X. (2017). Timbre analysis of music audio signals
with convolutional neural networks. In 25th Eu-
ropean Signal Processing Conference (EUSIPCO),
pages 2744–2748. IEEE.
Ramaswamy, H. G. et al. (2020). Ablation-CAM: Vi-
sual explanations for deep convolutional network via
gradient-free localization. In IEEE/CVF Winter Con-
ference on Applications of Computer Vision (WACV),
pages 983–991.
Rawat, W. and Wang, Z. (2017). Deep convolutional neural
networks for image classification: A comprehensive
review. Neural Computation, 29(9):2352–2449.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Parikh, D., and Batra, D. (2017). Grad-CAM: Visual
explanations from deep networks via gradient-based
localization. In IEEE International Conference on
Computer Vision (ICCV), pages 618–626.
Shendryk, I., Rist, Y., Lucas, R., Thorburn, P., and Tice-
hurst, C. (2018). Deep learning - a new approach
for multi-label scene classification in planetscope and
sentinel-2 imagery. In IEEE International Geoscience
and Remote Sensing Symposium (IGARSS), pages
1116–1119.
Simonyan, K., Vedaldi, A., and Zisserman, A. (2014).
Deep inside convolutional networks: Visualising im-
age classification models and saliency maps. CoRR,
abs/1312.6034.
Smilkov, D., Thorat, N., Kim, B., Vi
´
egas, F. B., and Wat-
tenberg, M. (2017). SmoothGrad: removing noise by
adding noise. ArXiv, abs/1706.03825.
Springenberg, J., Dosovitskiy, A., Brox, T., and Riedmiller,
M. (2015). Striving for simplicity: The all convolu-
tional net. In International Conference on Learning
Representations (ICLR) - Workshop Track.
Srinivas, S. and Fleuret, F. (2019). Full-gradient represen-
tation for neural network visualization. arXiv preprint
arXiv:1905.00780.
Su, Y., Sun, R., Lin, G., and Wu, Q. (2021). Context de-
coupling augmentation for weakly supervised seman-
tic segmentation. ArXiv, abs/2103.01795.
Tachibana, H., Uenoyama, K., and Aihara, S. (2018). Effi-
ciently trainable text-to-speech system based on deep
convolutional networks with guided attention. In
IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pages 4784–4788.
Tarekegn, A. N., Giacobini, M., and Michalak, K. (2021).
A review of methods for imbalanced multi-label clas-
sification. Pattern Recognition, 118:107965.
Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S.,
Mardziel, P., and Hu, X. (2020). Score-CAM: Score-
weighted visual explanations for convolutional neural
networks. IEEE/CVF Conference on Computer Vision
and Pattern Recognition Workshops (CVPRW), pages
111–119.
Wei, S.-E., Ramakrishna, V., Kanade, T., and Sheikh, Y.
(2016). Convolutional pose machines. In IEEE con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 4724–4732.
Yao, L., Mao, C., and Luo, Y. (2019). Graph convolu-
tional networks for text classification. In AAAI con-
ference on artificial intelligence, volume 33, pages
7370–7377.
Zhang, D., Han, J., Cheng, G., and Yang, M.-H. (2021).
Weakly supervised object localization and detection:
A survey. IEEE Transactions on Pattern Analysis and
Machine Intelligence, pages 1–1.
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Tor-
ralba, A. (2016). Learning deep features for discrimi-
native localization. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pages 2921–
2929.
MinMax-CAM: Improving Focus of CAM-based Visualization Techniques in Multi-label Problems
117