This paper is published in Volume-7, Issue-3, 2021
Area
Information Technology
Author
Suraj Dahake, Aditya Ohekar, Shubham Ilag, Aasim Shah
Org/Univ
Datta Meghe College of Engineering, Navi Mumbai, Maharashtra, India
Keywords
Real-Time, Captioning, Image, CNN, Tensorflow
Citations
IEEE
Suraj Dahake, Aditya Ohekar, Shubham Ilag, Aasim Shah. RICA: Real-Time Image Captioning Application, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.
APA
Suraj Dahake, Aditya Ohekar, Shubham Ilag, Aasim Shah (2021). RICA: Real-Time Image Captioning Application. International Journal of Advance Research, Ideas and Innovations in Technology, 7(3) www.IJARIIT.com.
MLA
Suraj Dahake, Aditya Ohekar, Shubham Ilag, Aasim Shah. "RICA: Real-Time Image Captioning Application." International Journal of Advance Research, Ideas and Innovations in Technology 7.3 (2021). www.IJARIIT.com.
Suraj Dahake, Aditya Ohekar, Shubham Ilag, Aasim Shah. RICA: Real-Time Image Captioning Application, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.
APA
Suraj Dahake, Aditya Ohekar, Shubham Ilag, Aasim Shah (2021). RICA: Real-Time Image Captioning Application. International Journal of Advance Research, Ideas and Innovations in Technology, 7(3) www.IJARIIT.com.
MLA
Suraj Dahake, Aditya Ohekar, Shubham Ilag, Aasim Shah. "RICA: Real-Time Image Captioning Application." International Journal of Advance Research, Ideas and Innovations in Technology 7.3 (2021). www.IJARIIT.com.
Abstract
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. Image caption generator is a task that involves computer vision and natural language processing concepts to recognize the context of an image and describe them in a natural language like English. The recent advances in Deep Learning-based Machine Translation and Computer Vision have led to excellent Image Captioning models using advanced techniques like Deep Reinforcement Learning. While these models are very accurate, these often rely on the use of expensive computation hardware making it difficult to apply these models in real-time scenarios, where their actual applications can be realized. In this paper, we carefully follow some of the core concepts of Image Captioning and its common approaches and present our simplistic encoder and decoder-based implementation with significant modifications and optimizations which enable us to run these models on low-end hardware of hand-held devices. We also compare our results evaluated using various metrics with state-of-the-art models and analyze why and where our model trained on the MSCOCO dataset lacks due to the trade-off between computation speed and quality. Using the state-of-the-art TensorFlow framework by Google, we also implement a first-of-its-kind Android application to demonstrate the real-time applicability and optimizations of our approach.