black-box machine learning models. In Inter-
national Conference on Learning Representations
(ICLR) 2018.
Carlini, N. and Wagner, D. (2017). Towards evaluating the
robustness of neural networks. In 2017 IEEE Sympo-
sium on Security and Privacy (SP), pages 39–57.
Chen, J., Jordan, M. I., and Wainwright, M. J. (2020). Hop-
skipjumpattack: A query-efficient decision-based at-
tack. In 2020 IEEE Symposium on Security and Pri-
vacy (SP), pages 1277–1294.
Cuturi, M. (2013). Sinkhorn distances: Lightspeed com-
putation of optimal transport. In Advances in Neural
Information Processing Systems 26, volume 26, pages
2292–2300.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei,
L. (2009). Imagenet: A large-scale hierarchical image
database. In 2009 IEEE conference on computer vi-
sion and pattern recognition, pages 248–255.
Donahue, J., Kr
¨
ahenb
¨
uhl, P., and Darrell, T. (2016).
Adversarial feature learning. arXiv preprint
arXiv:1605.09782.
Donahue, J. and Simonyan, K. (2019). Large scale adver-
sarial representation learning. In Advances in Neu-
ral Information Processing Systems, volume 32, pages
10541–10551.
Dumoulin, V., Belghazi, I., Poole, B., Mastropietro, O.,
Lamb, A., Arjovsky, M., and Courville, A. (2016).
Adversarially learned inference. arXiv preprint
arXiv:1606.00704.
Feydy, J., S
´
ejourn
´
e, T., Vialard, F.-X., Amari, S.-i., Trouve,
A., and Peyr
´
e, G. (2019). Interpolating between op-
timal transport and mmd using sinkhorn divergences.
In The 22nd International Conference on Artificial In-
telligence and Statistics, pages 2681–2690.
Goodfellow, I. J., Shlens, J., and Szegedy, C. (2015). Ex-
plaining and harnessing adversarial examples. In In-
ternational Conference on Learning Representations
(ICLR) 2015.
Johnson, N. F. and Jajodia, S. (1998). Exploring steganog-
raphy: Seeing the unseen. Computer, 31(2):26–34.
Kurakin, A., Goodfellow, I., and Bengio, S. (2016). Ad-
versarial machine learning at scale. arXiv preprint
arXiv:1611.01236.
LeCun, Y., Cortes, C., and Burges, C. (2010). Mnist hand-
written digit database. ATT Labs [Online]. Available:
http://yann.lecun.com/exdb/mnist, 2.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and
Vladu, A. (2018). Towards deep learning models re-
sistant to adversarial attacks. In International Confer-
ence on Learning Representations (ICLR) 2018.
Meng, D. and Chen, H. (2017). Magnet: A two-pronged
defense against adversarial examples. In Proceedings
of the 2017 ACM SIGSAC Conference on Computer
and Communications Security, pages 135–147.
Nocedal, J. and Wright, S. (2006). Numerical optimization.
Springer Science & Business Media.
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik,
Z. B., and Swami, A. (2016). The limitations of deep
learning in adversarial settings. In 2016 IEEE Euro-
pean Symposium on Security and Privacy (EuroS&P),
pages 372–387.
Rauber, J., Zimmermann, R., Bethge, M., and Brendel, W.
(2020). Foolbox Native: Fast adversarial attacks to
benchmark the robustness of machine learning models
in PyTorch, TensorFlow, and JAX. Journal of Open
Source Software, 5(53):2607.
Samangouei, P., Kabkab, M., and Chellappa, R. (2018).
Defense-gan: Protecting classifiers against adver-
sarial attacks using generative models. In Inter-
national Conference on Learning Representations
(ICLR) 2018.
Simonyan, K. and Zisserman, A. (2015). Very deep con-
volutional networks for large-scale image recognition.
In International Conference on Learning Representa-
tions (ICLR) 2015.
Song, Y., Shu, R., Kushman, N., and Ermon, S. (2018).
Constructing unrestricted adversarial examples with
generative models. In Advances in Neural Information
Processing Systems, volume 31, pages 8312–8323.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,
D., Goodfellow, I., and Fergus, R. (2014). Intriguing
properties of neural networks. In International Con-
ference on Learning Representations (ICLR) 2014.
Tan, M. and Le, Q. V. (2019). Efficientnet: Rethink-
ing model scaling for convolutional neural networks.
In International Conference on Machine Learning,
pages 6105–6114.
Wong, E., Schmidt, F. R., and Kolter, J. Z. (2019). Wasser-
stein adversarial examples via projected sinkhorn it-
erations. In International Conference on Machine
Learning, pages 6808–6817.
Xiao, C., Li, B., yan Zhu, J., He, W., Liu, M., and Song,
D. (2018). Generating adversarial examples with ad-
versarial networks. In Proceedings of the Twenty-
Seventh International Joint Conference on Artificial
Intelligence, pages 3905–3911.
APPENDIX: ADDITIONAL
RESULTS
This section extends the results from the main
manuscript body. We will always present a figure and
then compare it with the corresponding figure from
the manuscript body. The former figures start with a
letter while the latter figures with a digit.
Figure 8 shows the untargeted attacks for the non-
robust network. It corresponds to Figure 3 from the
manuscript body. The images are again nice, with the
Wasserstein attack performing better than the l
2
at-
tack. The small digit in each subfigure shows to which
class the digit was misclassified. As we have already
mentioned, our method always works with feasible
points and, therefore, all digits were successfully mis-
classified. In other words, these images were gener-
ated randomly without the need for manual selection.
Adversarial Examples by Perturbing High-level Features in Intermediate Decoder Layers
505