In this paper, we address the problem of learning to summarize personal photo albums. That is, given a photo album, we aim to select a small set of representative images from the album so that the extracted summary captures most of the story that is being told through the images. More specifically, we extend a recently proposed recurrent neural network based framework by employing a more effective way to represent images and, more importantly, adding a diversity term to the main objective. Our diversity term is based on the idea of jointly training a discriminator network to evaluate the diversity of the selected images. This alleviates the issue of selecting near-duplicate or semantically similar images, which is the primary shortcoming of the base approach. The experimental results show that our improved model produces better or comparable summaries, providing a good balance between quality and diversity.
@inproceedings{ozkose2019IPTA,
title={Diverse Neural Photo Album Summarization},
author={Yunus Emre Ozkose and Bora Celikkale and Erkut Erdem and Aykut Erdem},
booktitle={International Conference on Image Processing Theory, Tools and Applications (IPTA)},
year={2019}
}