PASCALVOC2012  dataset.  As  shown  in  Table  5, 
when we did not introduce additional connections, the 
accuracy was 80.36%. The gain of cross cooperative 
connection at Connection1  is  0.16%. When we add 
cross  connection  at  Connection2  to  our  method,  it 
boosted  1.15%  in  comparison  with  the  proposed 
method  without  additional  connections.  Especially, 
ASPP  improved  the  feature  extraction  ability  by 
performing  some  dilated  convolutions  and  pooling, 
and it can obtain beneficial feature maps. Therefore, 
cross connection at Connection1 brings good effect to 
ASPP  and  cross  connection  at  Connection2  brings 
good  effect  in  decoding  the  extracted  information. 
These  results  demonstrated  the  effectiveness  of  the 
additional cross cooperative connection. 
5  CONCLUSION  
In this paper, we proposed new cooperative learning 
method by fusing the features of different backbone 
networks for semantic segmentation. Especially, we 
used  cross  cooperative  learning  with  two  different 
backbones,  and  our  method  improved  the 
conventional  cooperative  learning.    We  confirmed 
that our method improved the segmentation accuracy 
on the PASCAL VOC2012 dataset and the Cityscapes 
dataset.  
The  proposed  cross  cooperative  network  used 
much calculation resource because our method needs 
multiple  backbone  networks.  Therefore,  we  would 
like  to  realize  the  cross  cooperative  learning  with 
lower computational cost and high accuracy. This is 
a subject for future works. 
ACKNOWLEDGEMENTS 
This paper is partially supported by JSPS KAKENHI 
18K11382. 
REFERENCES 
Krizhevsky,  A.,  Sutskever,  I.,  Hinton,  G.E.:  Imagenet 
classification  with  deep  con-volutional  neural 
networks.    In:Advances  in  Neural  Information 
Processing Sys-tems. 1097–1105 (2012) 
Szegedy,  C.,  Liu,  W.,  Jia,  Y.,  Sermanet,  P.,  Reed,  S., 
Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, 
A.: Going deeper with convolutions. In: Proceedings of 
the IEEE conference on Computer Vision and Pattern 
Recognition. pp. 1–9 (2015)  
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., 
Wang,  X.,  Tang,  X.:  Residual  attention  network  for 
image classification. In: Proceedings of the IEEE 
Conference  on  Computer  Vision  and  Pattern 
Recognition. pp. 3156–3164 (2017) 
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only 
look  once:unified,  real-time  object  detection.  In: 
Proceedings  of  the  IEEE  Conference  on  Computer 
Vision and Pattern Recognition. pp. 779–788 (2016)  
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., 
Fu,  C.Y.,  Berg,  A.C.:  Ssd:  Single  shot  multibox 
detector. In: Proceedings of the European Conference 
on Computer Vision. pp. 21–37. Springer (2016)  
Cao,  Z.,  Hidalgo,  G.,  Simon,  T.,  Wei,  S.E.,  Sheikh,  Y.: 
Openpose:  realtime  multi-person  2d  pose  estimation 
using  part  affinity  fields.  arXiv  preprint 
arXiv:1812.08008 (2018) 
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-
person 2d pose estimation using part affinity fields. In: 
Proceedings  of  the  IEEE  Conference  on  Computer 
Vision and Pattern Recognition. pp. 7291–7299 (2017)  
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image 
translation  with  conditional  adversarial  networks.  In: 
Proceedings  of  the  IEEE  conference  on  Computer 
Vision and Pattern Recognition. pp. 1125–1134 (2017)  
Ronneberger,  O.,  Fischer,  P.,  Brox,  T.:  U-net: 
Convolutional  networks  for  biomedical  image 
segmentation. In: International Conference on Medical 
Image  Computing  and  Computer-Assisted 
Intervention. pp. 234–241. Springer (2015)  
Chen, L.C., Collins, M., Zhu, Y., Papandreou, G., Zoph, B., 
Schroff, F., Adam, H., Shlens, J.: Searching for efficient 
multi-scale architectures for dense image prediction. In: 
Advances  in  Neural  Information  Processing  Systems. 
pp. 8699–8710 (2018)  
Yang, M., Yu, K., Zhang, C., Li, Z., Yang, K.: Denseaspp 
for  semantic  segmentation  in  street  scenes.  In: 
Proceedings  of  the  IEEE  Conference  on  Computer 
Vision and Pattern Recognition. pp. 3684–3692 (2018)  
Havaei,  M.,  Davy,  A.,  Warde-Farley,  D.,  Biard,  A., 
Courville,  A.,  Bengio,  Y.,  Pal,  C.,  Jodoin,  P.M., 
Larochelle,  H.:  Brain  tumor  segmentation  with  deep 
neural  networks.  Medical  image  analysis  35,  18–31 
(2017)  
Ji,  X.,  Li,  Y.,  Cheng,  J.,  Yu,  Y.,  Wang,  M.:  Cell  image 
segmentation  based  on  an  improved  watershed 
algorithm.  In:  2015  8th  International  Congress  on 
Image and Signal Processing. pp. 433–437. (2015)  
Ryota,  I.  and  Kazuhiro,  H.:  Feature  Sharing  Cooperative 
Network  for  Semantic  Segmentation.  In:  Proceedings 
of the 16th International Joint Conference on Computer 
Vision,  Imaging  and  Computer  Graphics  Theory  and 
Applications, pp. 577-584. (2021) 
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, 
M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The 
cityscapes  dataset  for  semantic  urban  scene 
understanding. In: Proceedings of the IEEE conference 
on  Computer  Vision  and  Pattern  Recognition.  pp. 
3213–3223 (2016)