|This paper presents a novel Siamese network architecture, as a variant of Resnet-50, to estimate the relative camera pose on multi-view environments. In order to improve the performance of the proposed model
a transfer learning strategy, based on synthetic images obtained from a virtual-world, is considered. The
transfer learning consist of first training the network using pairs of images from the virtual-world scenario
considering different conditions (i.e., weather, illumination, objects, buildings, etc.); then, the learned weight
of the network are transferred to the real case, where images from real-world scenarios are considered. Experimental results and comparisons with the state of the art show both, improvements on the relative pose
estimation accuracy using the proposed model, as well as further improvements when the transfer learning
strategy (synthetic-world data – transfer learning – real-world data) is considered to tackle the limitation on
the training due to the reduced number of pairs of real-images on most of the public data sets.