研究生: |
朱昌駿 Chang-Chun Chu |
---|---|
論文名稱: |
基於渲染遮罩式生成對抗網路之非成對物件轉換技術 Unpaired Object Transformation based on Rendering-mask-based Generative Adversarial Networks |
指導教授: |
郭景明
Jing-Ming Guo |
口試委員: |
陳美勇
Mei-Yung Chan 王靖維 Ching-Wei Wang 王乃堅 Nai-Jian Wang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 134 |
中文關鍵詞: | 生成對抗網路 、物件轉換 、卷積神經網路 、深度學習 |
外文關鍵詞: | Generative Adversarial Networks, Object Transformation, Convolutional Neural Network, Deep Learning |
相關次數: | 點閱:689 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
生成對抗網路促進許多領域的發展,因為生成對抗網路可以透過對抗式的訓練讓網路學習到訓練資料的分布,並且生成與訓練資料分布類似的資料。基礎的生成對抗網路由兩個獨立的網路所組合的,生成網路是用來合成樣本,判別網路是用來評估合成樣本與真實樣本的關聯性。
在現有基於生成對抗網路於未給定成對訓練集,以及在訓練和測試階段皆未標記感興趣物件區域的物件轉換任務中,於非感興趣區域會產生明顯的顏色失真現象,例如:背景區域。在我們提出的方法中,為了減少在背景區域的顏色失真與維持相同的人工標記成本之下,我們透過殘差性的連接讓生成網路合成渲染遮罩,並且將渲染遮罩之平滑程度納入網路最佳化的條件中,在不給定成對訓練集和預定義物件位置的情況下,能夠將物件轉換,亦可以保留原始影像的背景資訊。在定性與定量實驗結果裡,我們的方法皆優於過往技術,在視覺評估中,相較於過往技術,我們可以在背景區域有效的減少偽影產生,保留更多的視覺資訊。
Generative adversarial networks promote a variety of development for the reason that GAN can learn the distribution of given training samples and yield synthetic samples via adversarial training. The vanilla generative adversarial networks is the combination of two independent networks. The generative network is to synthesize fake samples, and the discriminative network is used to evaluate the relevance between fake samples and real ones.
In the existing approaches of object transformation based on generative adversarial networks without giving paired images and annotating the object of interest in the training and testing phases, the synthetic images of theirs would have significant color change in the region we are not desired to translate at all, such as background. In our proposed method, for the purpose of reducing color distortion in the background and remaining the same manual annotation cost, we aim to generate rendering masks via residual connections in the generative network, and take their uniformity into consideration in the optimization process to preserve the original contents of background without any paired training data and predefined region. As demonstrated in our experimental results, the qualitative and quantitative evaluation exceed previous works, as well as our approach can efficiently decrease artifacts and conspicuously preserve more visual information in the region of background in regard to the visual perception in comparison with alternative methods.
[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems (NIPS), 2014.
[2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, 86(11):2278–2324, 1998.
[3] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (NIPS), 2012.
[4] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proc. Int. Conf. Machine Learning (ICML), 2010.
[5] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60(2), pp. 91-110, 2004.
[6] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional neural networks,” in Proc. European Conference on Computer Vision (ECCV), 2014.
[7] M. Zeiler, D. Krishnan, G. Taylor, and R. Fergus, “Deconvolutional Networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010.
[8] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.
[9] M. Abadi et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
[10] Keras: https://keras.io/
[11] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009.
[12] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learning Representations (ICLR), 2015.
[13] CS231n: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf
[14] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015.
[15] K. He, X. Zhang, S. Ren, and J. Sun , “Deep residual learning for image recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016.
[16] G. Huang, Z. Liu, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017.
[17] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
[18] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
[19] C. O. Augustus Odena and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” arXiv preprint arXiv:1610.09585, 2016.
[20] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. “Generative adversarial text to image synthesis,” in Proc. Int. Conf. Machine Learning (ICML), 2016.
[21] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. Metaxas. “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2017.
[22] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. “Image-toimage translation with conditional adversarial networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017.
[23] O. Ronneberger, P. Fischer, and T. Brox. “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015.
[24] J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2017.
[25] X. Chen, C. Xu, X. Yang, and D. Tao, “Attention-GAN for Object Transfiguration in Wild Images,” arXiv preprint arXiv:1803.06798, 2018.
[26] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Machine Learning (ICML), 2017.
[27] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of wasserstein gans,” arXiv preprint arXiv:1704.00028, 2017.
[28] X. Mao, Q. Li, H. Xie, R. Y.K. Lau, Z. Wang, and S. P. Smolley, “Least squares generative adversarial networks,” arXiv preprint arXiv:1611.04076, 2016.
[29] J. Johnson, A. Alahi, and F. Li, “Perceptual losses for real-time style transfer and super-resolution,” in Proc. European Conference on Computer Vision (ECCV), 2016.
[30] C. Li and M. Wand, “Precomputed real-time texture synthesis with markovian generative adversarial networks,” in Proc. European Conference on Computer Vision (ECCV), 2016.
[31] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[32] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick , “Microsoft coco: Common objects in context,” in Proc. European Conference on Computer Vision (ECCV), 2014.
[33] Z. Liu, P. Luo, X. Wang, X. Tang , “Deep learning face attributes in the wild,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2015.
[34] Q. Huynh-Thu and M. Ghanbari, “Scope of validity of PSNR in image/video quality assessment,” Electron. Lett., vol. 44, no. 13, pp. 800–801, 2008.
[35] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error measurement to structural similarity,” IEEE Trans. Image Processing, vol. 13, Jan. 2004.
[36] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford and, X. Chen, “Improved techniques for training gans,” arXiv preprint arXiv:1606.03498, 2016.
[37] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016.
[38] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proc. Int. Conf. Learning Representations (ICLR), 2015.
[39] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb, “Learning from simulated and unsupervised images through adversarial training,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017.
[40] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan, “Unsupervised pixel-level domain adaptation with generative adversarial networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017.