Publication - MSVD-Turkish: A Large-Scale Dataset for Video Captioning in Turkish (in Turkish)

Type
National Conference Publication

Authors
Begum Citamak, Menekse Kuyu, Aykut Erdem, Erkut Erdem

Published/Presented on
27th IEEE Signal Processing and Communications Applications Conference (SIU) 2019

Publication Page
No information

Full Text
Click to download

Supplementary Materials
No information

Abstract Automatically generating natural language descriptions for videos, aka video captioning, has been recently introduced as a challenging integrated vision and language problem. Although researchers have demonstrated numerous solutions for English, to date there has been no study on Turkish language due to the lack of suitable datasets to train Turkish video captioning models. To tackle this, in this study we construct a large scale Turkish benchmark dataset by carefully translating English descriptions from MSVD dataset to Turkish. Moreover, we implement several neural models, including LSTM-based sequence-to sequence architectures with temporal attention mechanisms, and report the performances of these strong baselines on our dataset. We hope that our dataset will serve as a good resource for future efforts on Turkish video captioning.

BibTeX

@inproceedings{Citamak2019SIU,
title={MSVD-Turkish: A Large-Scale Dataset for Video Captioning in Turkish},
author={Begum Citamak and Menekse Kuyu and Aykut Erdem and Erkut Erdem},
booktitle={Signal Processing and Communications Applications Conference (SIU) 2019},
pages={1--4},
year={2019},
organization={IEEE}
}