Santos V., Angel D. Sappa., & Oliveira M. & de la Escalera A. (2019). Special Issue on Autonomous Driving and Driver Assistance Systems. In Robotics and Autonomous Systems, 121.
|
Jorge L. Charco, Angel D. Sappa, Boris X. Vintimilla, & Henry O. Velesaca. (2020). Transfer Learning from Synthetic Data in the Camera Pose Estimation Problem. In The 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020); Valletta, Malta; 27-29 Febrero 2020 (Vol. 4, pp. 498–505).
Abstract: This paper presents a novel Siamese network architecture, as a variant of Resnet-50, to estimate the relative camera pose on multi-view environments. In order to improve the performance of the proposed model
a transfer learning strategy, based on synthetic images obtained from a virtual-world, is considered. The
transfer learning consist of first training the network using pairs of images from the virtual-world scenario
considering different conditions (i.e., weather, illumination, objects, buildings, etc.); then, the learned weight
of the network are transferred to the real case, where images from real-world scenarios are considered. Experimental results and comparisons with the state of the art show both, improvements on the relative pose
estimation accuracy using the proposed model, as well as further improvements when the transfer learning
strategy (synthetic-world data – transfer learning – real-world data) is considered to tackle the limitation on
the training due to the reduced number of pairs of real-images on most of the public data sets.
|
Rafael E. Rivadeneira, Angel D. Sappa, & Boris X. Vintimilla. (2020). Thermal Image Super-Resolution: a Novel Architecture and Dataset. In The 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020); Valletta, Malta; 27-29 Febrero 2020 (Vol. 4, pp. 111–119).
Abstract: This paper proposes a novel CycleGAN architecture for thermal image super-resolution, together with a large
dataset consisting of thermal images at different resolutions. The dataset has been acquired using three thermal
cameras at different resolutions, which acquire images from the same scenario at the same time. The thermal
cameras are mounted in rig trying to minimize the baseline distance to make easier the registration problem.
The proposed architecture is based on ResNet6 as a Generator and PatchGAN as Discriminator. The novelty
on the proposed unsupervised super-resolution training (CycleGAN) is possible due to the existence of aforementioned thermal images—images of the same scenario with different resolutions. The proposed approach
is evaluated in the dataset and compared with classical bicubic interpolation. The dataset and the network are
available.
|
Suárez P. (2021). Processing and Representation of Multispectral Images Using Deep Learning Techniques. In Electronic Letters on Computer Vision and Image Analysis, Vol. 19(Issue 2), pp. 5–8.
|
Rafael E. Rivadeneira, Angel D. Sappa, Boris X. Vintimilla, Lin Guo, Jiankun Hou, Armin Mehri, et al. (2020). Thermal Image Super-Resolution Challenge – PBVS 2020. In The 16th IEEE Workshop on Perception Beyond the Visible Spectrum on the Conference on Computer Vision and Pattern Recongnition (CVPR 2020) (Vol. 2020-June, pp. 432–439).
Abstract: This paper summarizes the top contributions to the first challenge on thermal image super-resolution (TISR) which was organized as part of the Perception Beyond the Visible Spectrum (PBVS) 2020 workshop. In this challenge, a novel thermal image dataset is considered together with stateof-the-art approaches evaluated under a common framework.
The dataset used in the challenge consists of 1021 thermal images, obtained from three distinct thermal cameras at different resolutions (low-resolution, mid-resolution, and high-resolution), resulting in a total of 3063 thermal images. From each resolution, 951 images are used for training and 50 for testing while the 20 remaining images are used for two proposed evaluations. The first evaluation consists of downsampling the low-resolution, midresolution, and high-resolution thermal images by x2, x3 and x4 respectively, and comparing their super-resolution
results with the corresponding ground truth images. The second evaluation is comprised of obtaining the x2 superresolution from a given mid-resolution thermal image and comparing it with the corresponding semi-registered highresolution thermal image. Out of 51 registered participants, 6 teams reached the final validation phase.
|
Xavier Soria, Edgar Riba, & Angel D. Sappa. (2020). Dense Extreme Inception Network: Towards a Robust CNN Model for Edge Detection. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1912–1921).
Abstract: This paper proposes a Deep Learning based edge de- tector, which is inspired on both HED (Holistically-Nested Edge Detection) and Xception networks. The proposed ap- proach generates thin edge-maps that are plausible for hu- man eyes; it can be used in any edge detection task without previous training or fine tuning process. As a second contri- bution, a large dataset with carefully annotated edges, has been generated. This dataset has been used for training the proposed approach as well the state-of-the-art algorithms for comparisons. Quantitative and qualitative evaluations have been performed on different benchmarks showing im- provements with the proposed method when F-measure of ODS and OIS are considered.
|
Henry O. Velesaca, Raul A. Mira, Patricia L. Suarez, Christian X. Larrea, & Angel D. Sappa. (2020). Deep Learning based Corn Kernel Classification. In The 1st International Workshop and Prize Challenge on Agriculture-Vision: Challenges & Opportunities for Computer Vision in Agriculture on the Conference Computer on Vision and Pattern Recongnition (CVPR 2020) (Vol. 2020-June, pp. 294–302).
Abstract: This paper presents a full pipeline to classify sample sets of corn kernels. The proposed approach follows a segmentation-classification scheme. The image segmentation is performed through a well known deep learning based
approach, the Mask R-CNN architecture, while the classification is performed by means of a novel-lightweight network specially designed for this task—good corn kernel, defective corn kernel and impurity categories are considered.
As a second contribution, a carefully annotated multitouching corn kernel dataset has been generated. This dataset has been used for training the segmentation and
the classification modules. Quantitative evaluations have been performed and comparisons with other approaches provided showing improvements with the proposed pipeline.
|
Henry O. Velesaca, S. A., Patricia L. Suarez, Ángel Sanchez & Angel D. Sappa. (2020). Off-the-Shelf Based System for Urban Environment Video Analytics. In The 27th International Conference on Systems, Signals and Image Processing (IWSSIP 2020) (Vol. 2020-July, pp. 459–464).
Abstract: This paper presents the design and implementation details of a system build-up by using off-the-shelf algorithms for urban video analytics. The system allows the connection to public video surveillance camera networks to obtain the necessary
information to generate statistics from urban scenarios (e.g., amount of vehicles, type of cars, direction, numbers of persons, etc.). The obtained information could be used not only for traffic management but also to estimate the carbon footprint of urban scenarios. As a case study, a university campus is selected to
evaluate the performance of the proposed system. The system is implemented in a modular way so that it is being used as a testbed to evaluate different algorithms. Implementation results are provided showing the validity and utility of the proposed approach.
|
Nayeth I. Solorzano Alcivar, R. L., Stalyn Gonzabay Yagual, & Boris X. Vintimilla. (2020). Statistical Representations of a Dashboard to Monitor Educational Videogames in Natural Language. In ETLTC – ACM Chapter: International Conference on Educational Technology, Language and Technical Communication; Fukushima, Japan, 27-31 Enero 2020 (Vol. 77).
Abstract: This paper explains how Natural Language (NL) processing by computers, through smart
programs as a way of Machine Learning (ML), can represent large sets of quantitative data as written
statements. The study recognized the need to improve the implemented web platform using a
dashboard in which we collected a set of extensive data to measure assessment factors of using
children´s educational games. In this case, applying NL is a strategy to give assessments, build, and
display more precise written statements to enhance the understanding of children´s gaming behavior.
We propose the development of a new tool to assess the use of written explanations rather than a
statistical representation of feedback information for the comprehension of parents and teachers with
a lack of primary level knowledge in statistics. Applying fuzzy logic theory, we present verbatim
explanations of children´s behavior playing educational videogames as NL interpretation instead of
statistical representations. An educational series of digital game applications for mobile devices,
identified as MIDI (Spanish acronym of “Interactive Didactic Multimedia for Children”) linked to a
dashboard in the cloud, is evaluated using the dashboard metrics. MIDI games tested in local primary
schools helps to evaluate the results of using the proposed tool. The guiding results allow analyzing
the degrees of playability and usability factors obtained from the data produced when children play a
MIDI game. The results obtained are presented in a comprehensive guiding evaluation report
applying NL for parents and teachers. These guiding evaluations are useful to enhance children's
learning understanding related to the school curricula applied to ludic digital games.
|
Cristhian A. Aguilera, C. A., Cristóbal A. Navarro, & Angel D. Sappa. (2020). Fast CNN Stereo Depth Estimation through Embedded GPU Devices. Sensors 2020, Vol. 2020-June(11), pp. 1–13.
Abstract: Current CNN-based stereo depth estimation models can barely run under real-time
constraints on embedded graphic processing unit (GPU) devices. Moreover, state-of-the-art
evaluations usually do not consider model optimization techniques, being that it is unknown what is
the current potential on embedded GPU devices. In this work, we evaluate two state-of-the-art models
on three different embedded GPU devices, with and without optimization methods, presenting
performance results that illustrate the actual capabilities of embedded GPU devices for stereo depth
estimation. More importantly, based on our evaluation, we propose the use of a U-Net like architecture
for postprocessing the cost-volume, instead of a typical sequence of 3D convolutions, drastically
augmenting the runtime speed of current models. In our experiments, we achieve real-time inference
speed, in the range of 5–32 ms, for 1216 368 input stereo images on the Jetson TX2, Jetson Xavier,
and Jetson Nano embedded devices.
|