Image Caption Generator Using CNN and RNN

Sagar Shrestha
Aawaj Shrestha
2020
BSc.CSIT
Semester 7
Downloads 39

In the past few years, the problem of generating descriptive sentences automatically for images has garnered a rising interest in natural language processing and computer vision research. Image captioning is a fundamental task which requires semantic understanding of images and the ability of generating description sentences with proper and correct structure. In this study, we propose a hybrid system employing the use of multilayer Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) models to generate vocabulary describing the images and a Long Short-Term Memory (LSTM) to accurately structure meaningful sentences using the generated keywords. We showcase the efficiency of our proposed model using the Flickr8K and Flickr30K datasets and show that their model gives superior results. We discuss the foundation of the techniques to analyze their performances, strengths and limitations. We also discuss the datasets and the evaluation metrics popularly used in deep learning based automatic image captioning.

Caption Generator
image recognition
long short-term memory
Caption Generator
image recognition
long short-term memory

Similar Projects