Data-Driven Image Captioning via Salient Region Discovery
Journal Article
Abstract In the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, we propose to integrate an object-based semantic image representation into a deep features-based retrieval framework to select the relevant images. Moreover, we present a novel phrase selection paradigm and a sentence generation model which depends on a joint analysis of salient regions in the input and retrieved images within a clustering framework. We demonstrate the effectiveness of our proposed approach on Flickr8K and Flickr30K benchmark datasets and show that our model gives highly competitive results compared to the state-of-the-art models.

title={Data-Driven Image Captioning via Salient Region Discovery},
author={Mert Kilickaya and Burak Kerim Akkus and Ruket Cakici and Aykut Erdem and Erkut Erdem and Nazli Ikizler-Cinbis},
journal={IET Computer Vision},
volume = {XXX},
number = {XXX},
pages = {XXX}