Yamin Sepehri
Omnidirectional images are the spherical visual signals that provide a wide, 360◦, view of a scene from a specific position. Such images are becoming increasingly popular in fields like virtual reality and robotics. Compared to conventional 2D images, the storage and badwidth requirements of omnidirectional signals are much higher, due to the specific nature of them. Thus, there is a need for image compression schemes to reduce the dedicated storage space of omnidirectional images. Image compression algorithms can be broadly classified into two groups: lossless and lossy. Lossless schemes are able to reconstruct the exact original data but they cannot reduce the size beyond a specific criteria. Lossy methods are generally better solutions if they do not add a high visual distortion to the reconstructed image, as long as they provide a decent compression rate. If a planar, lossy image compression scheme is applied on omnidirectional images, some problems show up. It is possible to apply a planar compression scheme on a projected version of a 360◦image; however, in these projection schemes (such as equirectangular projection) the sampling rate is different in the poles and the center. Consequently, the filters of the planar compression schemes that do not consider this difference ends in suboptimal result and distortions in the reconstructed images. Recently, with the success of deep neural networks in many image processing tasks, researchers began to use them for the image compression as well. In this study, we propose a deep learning-based method for the compression of omnidirectional images by combining some state of the art approaches from the deep learning-based image compression schemes and some special convolutional layers that take into account the geometry of the omnidirectional image. In comparison to the available methods, it is the first method that can be applied directly on the equirectangularly projected version of omnidirectional images and considers the geometry in the scheme and the layers themselves. To propose this method, different geometry-aware convolutional layers have been tried. We exploited various methods of downsampling and upsampling, such as spherical pooling layers, strided or transposed convolutions, bilinear interpolation, and pixel shuffle. In the end, a method is proposed that benefits from specific spherical convolutional layers which contain sampling methods considering the geometry of omnidirectional images. The sampling positions differ in the different heights of the image based on the nature of the projected omnidirectional image. Additionally, as it benefits from an iterative training method that calculates the residual between the output and input and feeds it again to the network as input of the next iteration, it can provide different compression rates with just one pass of training. Finally, it benefits from a novel method of patching that is well-aligned with the spherical convolution layers and helps the method to run efficiently without a need for a high computational power. The model was compared with a similar architecture without spherical convolutions and spherical patching and showed some improvements. The architecture has been optimized and improved and it has the potential to compete with popular image compression schemes such as JPEG especially in terms of reconstructing the colors.
2020