Layout and Context Understanding for Image Synthesis with Scene Graphs｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	Arces A. Talavera Arces A. Talavera
論文名稱：	Layout and Context Understanding for Image Synthesis with Scene Graphs Layout and Context Understanding for Image Synthesis with Scene Graphs
指導教授：	花凱龍 Kai-Lung Hua
口試委員:	花凱龍 Kai-Lung Hua Arnulfo Azcarraga Arnulfo Azcarraga 鮑興國 Hsing-Kuo Pao 楊傳凱 Chuan-Kai Yang 楊朝龍 Chao-Lung Yang
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	43
中文關鍵詞：	Generative Models 、Image Synthesis 、Scene Graphs
外文關鍵詞：	Generative Models, Image Synthesis, Scene Graphs
相關次數：	點閱：261 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Advancements on text-to-image synthesis generate remarkable images from textual descriptions. However, these methods are designed to generate only one object with varying attributes. They face difficulties with complex descriptions having multiple arbitrary objects since it would require information on the placement and sizes of each object in the image. Recently, a method that infers object layouts from scene graphs has been proposed as a solution to this problem. However, their method uses only object labels in describing the layout, which fail to capture the appearance of some objects. Moreover, their model is biased towards generating rectangular shaped objects in the absence of ground-truth masks. In this paper, we propose an object encoding module to capture object features and use it as additional information to the image generation network. We also introduce a graph-cuts based segmentation method that can infer the masks of objects from bounding boxes to better model object shapes. Our method produces more discernable images with more realistic shapes as compared to the images generated by the current state-of-the-art method.

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 Image Generation from Text . . . . . . . . . . . . 5
1.2 Image Generation from Semantic Layouts . . . . . 6
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1 Overview of Proposed Framework . . . . . . . . . . . . . 7
2 Mask Generation from Bounding Box . . . . . . . . . . . 9
3 Graph Convolution Network. . . . . . . . . . . . . . . . . 10
4 Object Encoding Module . . . . . . . . . . . . . . . . . . 11
5 Layout Prediction . . . . . . . . . . . . . . . . . . . . . . 12
6 Generator and Discriminators . . . . . . . . . . . . . . . . 13
v
6.1 Generator . . . . . . . . . . . . . . . . . . . . . . 13
6.2 Discriminators . . . . . . . . . . . . . . . . . . . 14
7 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 18
1 Implementation Details . . . . . . . . . . . . . . . . . . . 18
1.1 Network Architecture . . . . . . . . . . . . . . . . 18
1.2 Training . . . . . . . . . . . . . . . . . . . . . . . 18
2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Depiction of Object Appearance . . . . . . . . . . 21
3.2 Application of Masks to the Layout . . . . . . . . 21
3.3 Predicted Layout . . . . . . . . . . . . . . . . . . 23
4 User Study . . . . . . . . . . . . . . . . . . . . . . . . . 24
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 31
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
                                

[1] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to
image synthesis,” in Proceedings of the 33rd International Conference on International Conference
on Machine Learning - Volume 48, ICML’16, pp. 1060–1069, 2016.
[2] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas, “Stackgan: Text to photorealistic
image synthesis with stacked generative adversarial networks,” in Proceedings of the IEEE
International Conference on Computer Vision (ICCV), pp. 5907–5915, 2017.
[3] S. E. Reed, Z. Akata, S. Mohan, S. Tenka, B. Schiele, and H. Lee, “Learning what and where to draw,”
in Advances in Neural Information Processing Systems (NIPS), pp. 217–225, 2016.
[4] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011
dataset,” 2011.
[5] M.-E. Nilsback and A. Zisserman, “Automated flower classification over a large number of classes,”
in Computer Vision, Graphics & Image Processing, 2008. ICVGIP’08. Sixth Indian Conference on,
pp. 722–729, IEEE, 2008.
[6] J. Johnson, A. Gupta, and L. Fei-Fei, “Image generation from scene graphs,” in CVPR, 2018.
[7] S. Schuster, R. Krishna, A. Chang, L. Fei-Fei, and C. D. Manning, “Generating semantically precise
scene graphs from textual descriptions for improved image retrieval,” in EMNLP workshop on vision
and language, pp. 70–80, 2015.
[8] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A.
Shamma, et al., “Visual genome: Connecting language and vision using crowdsourced dense image
annotations,” International Journal of Computer Vision (IJCV), vol. 123, no. 1, pp. 32–73, 2017.
[9] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in ICLR, 2014.
[10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and
Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems
(NIPS), pp. 2672–2680, 2014.
[11] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784,
2014.
[12] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft
coco: Common objects in context,” in European conference on computer vision, pp. 740–755,
Springer, 2014.
[13] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial
networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–
5976, IEEE, 2017.
[14] Q. Chen and V. Koltun, “Photographic image synthesis with cascaded refinement networks,” in IEEE
International Conference on Computer Vision (ICCV), vol. 1, p. 3, 2017.
[15] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and
B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the
IEEE conference on computer vision and pattern recognition (CVPR), pp. 3213–3223, 2016.
[16] N. Xu, B. Price, S. Cohen, J. Yang, and T. Huang, “Deep grabcut for object selection,” in BMVC, 2017.
[17] K.-K. Maninis, S. Caelles, J. Pont-Tuset, and L. Van Gool, “Deep extreme cut: From extreme points
to object segmentation,” in CVPR, 2018.
[18] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image
segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions
on pattern analysis and machine intelligence (TPAMI), vol. 40, no. 4, pp. 834–848, 2018.
[19] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in IEEE Conf. on
Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890, 2017.
[20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings
of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778, 2016.
[21] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic
models,” in ICML, vol. 30, p. 3, 2013.
[22] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal
covariate shift,” in ICML, 2015.
[23] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2015.

簡易檢索 / 詳目顯示

相關論文