TechRxiv
Domain_Specific_Image_Captioning__A_Comprehensive_Review.pdf (921.04 kB)

Domain-Specific Image Captioning: A Comprehensive Review

Download (921.04 kB)
preprint
posted on 2022-07-19, 03:15 authored by Himanshu SharmaHimanshu Sharma, Devanand Padha

Image captioning is a sentence summarizing the semantic content of an image. Describing the essential information of an image in the form of a statement typically deals with computer vision and natural language processing. The prior research frequently addressed this subject simply using standard machine learning approaches that model such systems by extracting the hand-engineered characteristics from the input data. With the resurgence of deep-learning approaches, the image captioning area has experienced state-of-the-art outcomes in numerous application domains, including visual aid, traffic assistance, medical support, and remote sensing. This research thoroughly reviews the application domain-based image captioning frameworks by addressing both classical and modern deep learning-based architectures. We further propose a taxonomy to characterize the same by studying their distinct technological foundations and performances across diverse datasets. We also describe evaluation metrics used to examine the captioning image models across several domains. Finally, we highlight open research problems concerning such models from a futuristic standpoint.

History

Email Address of Submitting Author

himanshusharma.csit@cujammu.ac.in

ORCID of Submitting Author

0000-0001-5299-6143

Submitting Author's Institution

Central University of Jammu

Submitting Author's Country

  • India