Abstract: This paper proposes to use a deep learning network architecture for relative camera pose estimation on a multi-view environment. The proposed network is a variant architecture of AlexNet to use as regressor for prediction the relative translation and rotation as output. The proposed approach is trained from scratch on a large data set that takes as input a pair of images from the same scene. This new architecture is compared with a previous approach using standard metrics, obtaining better results on the relative camera pose.