<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>47</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Omid Hosseini Jafari</style></author><author><style face="normal" font="default" size="100%">Groth, Oliver</style></author><author><style face="normal" font="default" size="100%">Kirillov, Alexander</style></author><author><style face="normal" font="default" size="100%">Yang, Michael Ying</style></author><author><style face="normal" font="default" size="100%">Carsten Rother</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Analyzing modular CNN architectures for joint depth prediction and semantic segmentation</style></title><secondary-title><style face="normal" font="default" size="100%">Proceedings - IEEE International Conference on Robotics and Automation</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2017</style></year><pub-dates><date><style  face="normal" font="default" size="100%">feb</style></date></pub-dates></dates><urls><web-urls><url><style face="normal" font="default" size="100%">http://arxiv.org/abs/1702.08009 http://dx.doi.org/10.1109/ICRA.2017.7989537</style></url></web-urls></urls><pages><style face="normal" font="default" size="100%">4620–4627</style></pages><isbn><style face="normal" font="default" size="100%">9781509046331</style></isbn><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">This paper addresses the task of designing a modular neural network architecture that jointly solves different tasks. As an example we use the tasks of depth estimation and semantic segmentation given a single RGB image. The main focus of this work is to analyze the cross-modality influence between depth and semantic prediction maps on their joint refinement. While most of the previous works solely focus on measuring improvements in accuracy, we propose a way to quantify the cross-modality influence. We show that there is a relationship between final accuracy and cross-modality influence, although not a simple linear one. Hence a larger cross-modality influence does not necessarily translate into an improved accuracy. We find that a beneficial balance between the cross-modality influences can be achieved by network architecture and conjecture that this relationship can be utilized to understand different network design choices. Towards this end we propose a Convolutional Neural Network (CNN) architecture that fuses the state-of-the-art results for depth estimation and semantic labeling. By balancing the cross-modality influences between depth and semantic prediction, we achieve improved results for both tasks using the NYU-Depth v2 benchmark.</style></abstract></record></records></xml>