information to the feature map by skip connection is 
effective for visualizing important frames. 
5  CONCLUSION 
This  paper  proposed  the  Integration-Net  which 
integrates  different  networks  for  classifying  wild-
type  and  mutant  sperm  of  liverwort.    This  allows 
more accurate classification than conventional video 
classification  methods.  We  discover  the  difference 
between the two types of sperm by using the gradients 
of network. Our method discovered the difference of 
the flagella automatically. 
However, the heat map may be ambiguous or 
blurry. Therefore, we would like to use visualization 
methods that do not  depend on  gradient calculation 
such as Score-CAM (Wang, H., Wang, Z., Du, M., 
Yang, F., Zhang, Z., Ding, S., Mardziel, P. & Hu, X., 
2020),  or  consider  more  detailed  visualization 
methods that refer to it.  
In  addition,  we  would  like  to  use  ConvLSTM 
(Xingjian, S. H. I., Chen, Z., Wang, H., Yeung, D. Y., 
Wong, W. K.,  & Woo, W.  C., 2015) to  use motion 
information effectively. 
ACKNOWLEDGEMENT 
This  research  is  partially  supported  by  JSPS 
KAKENHI Grant Number 20H05427. 
REFERENCES 
Ji,  S.,  Xu,  W.,  Yang,  M.,  &  Yu,  K.,  “3D  convolutional 
neural networks  for human action recognition”, IEEE 
Transactions on Pattern Analysis and Machine 
Intelligence, Vol 35, pp. 221-231, 2013. 
Dalal, N., Triggs, B., & Schmid, C., “Human detection 
using oriented histograms of flow and appearance”, In 
European Conference on Computer Vision,  pp.  428-
441, 2006. 
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., 
Parikh, D., & Batra, D.,  “Grad-CAM:  Visual 
explanations  from  deep  networks  via  gradient-based 
localization”,  In  IEEE International Conference on 
Computer Vision, pp. 618-626, 2017. 
Chattopadhay,  A.,  Sarkar,  A.,  Howlader,  P.,  & 
Balasubramanian,  V.  N.,  “Grad-cam++:  Generalized 
gradient-based  visual  explanations  for  deep 
convolutional  networks”,  In IEEE Winter Conference 
on Applications of Computer Vision, pp. 839-847, 2018. 
Anderson,  D.  J.,  &  Perona,  P.,  “Toward  a  science  of 
computational  ethology”,  Neuron,  Vol  84,  pp.18-31, 
2014. 
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, 
M.  “Learning  spatiotemporal  features  with  3d 
convolutional  networks”,  In  IEEE International 
Conference on Computer Vision, pp. 4489-4497, 2015. 
Springenberg,  J.  T.,  Dosovitskiy,  A.,  Brox,  T.,  & 
Riedmiller,  M.,  “Striving  for  simplicity:  The  all 
convolutional  net”,  International Conference on 
Learning Representations Workshops, 2015. 
Zeiler, M. D., & Fergus, R., “Visualizing and understanding 
convolutional networks”, In European Conference on 
Computer Vision, pp. 818-833, 2014. 
Ma, N., Zhang, X., & Sun, J., “Funnel activation for visual 
recognition”,  In European Conference on Computer 
Vision, pp. 351-368, 2020. 
Lin,  T.  Y.,  Goyal,  P.,  Girshick,  R.,  He,  K.,  &  Dollár,  P.  
“Focal  loss  for  dense  object  detection”,  In  IEEE 
International Conference on Computer Vision,  pp. 
2980-2988, 2017. 
Hara,  K.,  Kataoka,  H.,  &  Satoh,  Y.,  “Learning  spatio-
temporal features with 3d residual networks for action 
recognition”, In Proceedings of the IEEE International 
Conference on Computer Vision Workshops, pp. 3154-
3160, 2017. 
Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S., 
Mardziel, P., & Hu, X., “Score-CAM: Score-weighted 
visual explanations for convolutional neural networks”, 
In IEEE Conference on Computer Vision and Pattern 
Recognition Workshops, pp. 24-25, 2020. 
Xingjian, S. H. I. Chen, Z., Wang, H., Yeung, D. Y., Wong, 
W. K., & Woo, W. C., “Convolutional LSTM network: 
A  machine  larning  approach  for  precipitation 
nowcasting”,  In  Advances in Neural Information 
Processing Systems, pp. 802-810, 2015.