Publication

Computational Aesthetics and Image Enhancements using Deep Neural Networks

Bin Jin
2018
EPFL thesis
Abstract

Imaging devices have become ubiquitous in modern life, and many of us capture an increasing number of images every day. When we choose to share or store some of these images, our primary selection criterion is to choose the most visually pleasing ones. Yet, quantifying visual pleasantness is a challenge, as image aesthetics not only correlate with low-level image quality, such as contrast, but also high-level visual processes, like composition and context. For most users, a considerable amount of manual effort and/or professional knowledge is required to get aesthetically pleasing images. Developing automatic solutions thus benefits a large community. This thesis proposes several computational approaches to help users obtain the desired images. The first technique aims at automatically measuring the aesthetics quality, which benefits the users in selecting and ranking images. We form the aesthetics prediction problem as a regression task and train a deep neural network on a large image aesthetics dataset. The unbalanced distribution of aesthetics scores in the training set can result in bias of the trained model towards certain aesthetics levels. Therefore, we propose to add sample weights during training to overcome such bias. Moreover, we build a loss function on the histograms of user labels, thus enabling the network to predict not only the average aesthetics quality but also the difficulty of such predictions. Extensive experiments demonstrate that our model outperforms the previous state-of-the-art by a notable margin. Additionally, we propose an image cropping technique that automatically outputs aesthetically pleasing crops. Given an input image and a certain template, we first extract a sufficient amount of candidate crops. These crops are later ranked according to the scores predicted by the pre-trained aesthetics network, after which the best crop is output to the users. We conduct psychophysical experiments to validate the performance. We further present a keyword-based image color re-rendering algorithm. For this task, the colors in the input image are modified to be visually more appealing according to the keyword specified by users. Our algorithm applies local color re-rendering operations to achieve this goal. A novel weakly-supervised semantic segmentation algorithm is developed to locate the keyword-related regions where the color re-rendering operations are applied. The color re-rendering process benefits from the segmentation network in two aspects. Firstly, we achieve more accurate correlation measurements between keywords and color characteristics, contributing to better re-render rendering results of the colors. Secondly, the artifacts caused by the color re-rendering operations are significantly reduced. To avoid the need of keywords when enhancing image aesthetics, we explore generative adversarial networks (GANs) for automatic image enhancement. GANs are known for directly learning the transformations between images from the training data. To learn the image enhancement operations, we train the GANs on an aesthetics dataset with three different losses combined. The first two are standard generative losses that enforce the generated images to be natural and content-wise similar to the input images. We propose a third aesthetics loss that aims at improving the aesthetics quality of the generated images. Overall, the three losses together direct the GANs to apply appropriate image enhancement operations.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (39)
Image editing
Image editing encompasses the processes of altering s, whether they are digital photographs, traditional photo-chemical photographs, or illustrations. Traditional analog image editing is known as photo retouching, using tools such as an airbrush to modify photographs or editing illustrations with any traditional art medium. Graphic software programs, which can be broadly grouped into vector graphics editors, raster graphics editors, and 3D modelers, are the primary tools with which a user may manipulate, enhance, and transform images.
Deep learning
Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.
Digital image
A digital image is an composed of picture elements, also known as pixels, each with finite, discrete quantities of numeric representation for its intensity or gray level that is an output from its two-dimensional functions fed as input by its spatial coordinates denoted with x, y on the x-axis and y-axis, respectively. Depending on whether the is fixed, it may be of vector or raster type. Raster image Raster images have a finite set of digital values, called picture elements or pixels.
Show more
Related publications (121)

Aggregating Spatial and Photometric Context for Photometric Stereo

David Honzátko

Photometric stereo, a computer vision technique for estimating the 3D shape of objects through images captured under varying illumination conditions, has been a topic of research for nearly four decades. In its general formulation, photometric stereo is an ...
EPFL2024

Fashioning Creative Expertise with Generative AI: Graphical Interfaces for GAN-Based Design Space Exploration Better Support Ideation Than Text Prompts for Diffusion Models

Pierre Dillenbourg, Richard Lee Davis, Kevin Gonyop Kim, Thiemo Wambsganss, Wei Jiang

This paper investigates the potential impact of deep generative models on the work of creative professionals. We argue that current generative modeling tools lack critical features that would make them useful creativity support tools, and introduce our own ...
2024

Deep Learning Generalization with Limited and Noisy Labels

Mahsa Forouzesh

Deep neural networks have become ubiquitous in today's technological landscape, finding their way in a vast array of applications. Deep supervised learning, which relies on large labeled datasets, has been particularly successful in areas such as image cla ...
EPFL2023
Show more
Related MOOCs (32)
Neuronal Dynamics 2- Computational Neuroscience: Neuronal Dynamics of Cognition
This course explains the mathematical and computational models that are used in the field of theoretical neuroscience to analyze the collective dynamics of thousands of interacting neurons.
Neuronal Dynamics 2- Computational Neuroscience: Neuronal Dynamics of Cognition
This course explains the mathematical and computational models that are used in the field of theoretical neuroscience to analyze the collective dynamics of thousands of interacting neurons.
Neuronal Dynamics - Computational Neuroscience of Single Neurons
The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.