Photographic Text-to-Image Synthesis via Multi-turn Dialogue Using Attentional GAN

Shiva Kumar Shrestha; Shashidhar Ram Joshi

doi:10.3126/jsce.v8i0.32860

Photographic Text-to-Image Synthesis via Multi-turn Dialogue Using Attentional GAN

Authors

Shiva Kumar Shrestha Department of Computer Engineering, Khwopa College of Engineering, Libali – 8, Bhaktapur Author
Shashidhar Ram Joshi Department of Electronics and Computer Engg., Pulchwok Campus, Kathmandu Author

DOI:

https://doi.org/10.3126/jsce.v8i0.32860

Keywords:

GAN, MultiTurnGAN, Text-to-image, Image generation, Realistic image synthesis

Abstract

The process of generating an image that depicts naturalness is not so easy. To address such problem this paper introduces a novel approach to synthesize a photo-realistic image from the caption. The user can adjust the image highlights turn-by-turn according to the caption. This leads to the integration of natural intelligence. For this, the input passed to dialogue state tracker to extract context feature. Then the generator produces an image. If image is not as per expectations then user gives another dialogue, but the system takes both recent input and previous image to generate a new one. In such a manner, user gets a chance to visualize as per the imagination. We performed extensive experiments on two datasets CUB and COCO to generate a realistic image each turn and obtained the results: Inception Score (IS) of 4.38 ± 0.05, R-precision of 67.96 ± 5.27 % on CUB dataset and IS of 26.12 ± 0.24, R-precision of 91.00 ± 2.31 % on COCO dataset. Further, the work could be enhance to synthesize HQ image, voice integration, and video generation from stories and so on. This research is limited to 256x256 image in each turn.

Downloads

Download data is not yet available.

Downloads

Published

2020-11-12

Issue

Vol. 8 (2020)

Section

Research Papers

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

This license allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.

How to Cite

Shrestha, S. K., & Joshi, S. R. (2020). Photographic Text-to-Image Synthesis via Multi-turn Dialogue Using Attentional GAN. Journal of Science and Engineering (JScE), 8, 30-37. https://doi.org/10.3126/jsce.v8i0.32860

Download Citation

Photographic Text-to-Image Synthesis via Multi-turn Dialogue Using Attentional GAN

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

Categories

License

How to Cite

Make a Submission

Information

Browse

Creative Commons