簡易檢索 / 詳目顯示

研究生: 朱昌駿
Chang-Chun Chu
論文名稱: 基於渲染遮罩式生成對抗網路之非成對物件轉換技術
Unpaired Object Transformation based on Rendering-mask-based Generative Adversarial Networks
指導教授: 郭景明
Jing-Ming Guo
口試委員: 陳美勇
Mei-Yung Chan
王靖維
Ching-Wei Wang
王乃堅
Nai-Jian Wang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 134
中文關鍵詞: 生成對抗網路物件轉換卷積神經網路深度學習
外文關鍵詞: Generative Adversarial Networks, Object Transformation, Convolutional Neural Network, Deep Learning
相關次數: 點閱:689下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 生成對抗網路促進許多領域的發展,因為生成對抗網路可以透過對抗式的訓練讓網路學習到訓練資料的分布,並且生成與訓練資料分布類似的資料。基礎的生成對抗網路由兩個獨立的網路所組合的,生成網路是用來合成樣本,判別網路是用來評估合成樣本與真實樣本的關聯性。
    在現有基於生成對抗網路於未給定成對訓練集,以及在訓練和測試階段皆未標記感興趣物件區域的物件轉換任務中,於非感興趣區域會產生明顯的顏色失真現象,例如:背景區域。在我們提出的方法中,為了減少在背景區域的顏色失真與維持相同的人工標記成本之下,我們透過殘差性的連接讓生成網路合成渲染遮罩,並且將渲染遮罩之平滑程度納入網路最佳化的條件中,在不給定成對訓練集和預定義物件位置的情況下,能夠將物件轉換,亦可以保留原始影像的背景資訊。在定性與定量實驗結果裡,我們的方法皆優於過往技術,在視覺評估中,相較於過往技術,我們可以在背景區域有效的減少偽影產生,保留更多的視覺資訊。


    Generative adversarial networks promote a variety of development for the reason that GAN can learn the distribution of given training samples and yield synthetic samples via adversarial training. The vanilla generative adversarial networks is the combination of two independent networks. The generative network is to synthesize fake samples, and the discriminative network is used to evaluate the relevance between fake samples and real ones.
    In the existing approaches of object transformation based on generative adversarial networks without giving paired images and annotating the object of interest in the training and testing phases, the synthetic images of theirs would have significant color change in the region we are not desired to translate at all, such as background. In our proposed method, for the purpose of reducing color distortion in the background and remaining the same manual annotation cost, we aim to generate rendering masks via residual connections in the generative network, and take their uniformity into consideration in the optimization process to preserve the original contents of background without any paired training data and predefined region. As demonstrated in our experimental results, the qualitative and quantitative evaluation exceed previous works, as well as our approach can efficiently decrease artifacts and conspicuously preserve more visual information in the region of background in regard to the visual perception in comparison with alternative methods.

    摘要 I Abstract II 致謝 III 目錄 IV 圖片索引 VI 表格索引 X 第一章 緒論 1 背景介紹 1 研究動機與目的 2 論文架構 3 第二章 神經網路與生成對抗網路之文獻探討 4 2.1 類神經網路的運作模式 5 2.1.1 向前傳遞(Forward Propagation) 5 2.1.2 反向傳遞(Backward Propagation) 7 2.2 影響類神經網路效能的因素 11 2.3 卷積神經網路 14 2.3.1 卷積 16 2.3.2 非線性激勵函數 18 2.3.3 匯集(Pooling) 20 2.3.4 訓練方法 21 2.3.5 視覺化過程 23 2.3.6 卷積神經網路架構之發展 25 2.4 生成對抗網路(Generative Adversarial Networks, GAN) 30 第三章 基於渲染遮罩式生成對抗網路之非成對物件轉換技術 69 3.1 損失函數(Loss Function) 71 3.1.1 Adversarial loss 71 3.1.2 Cycle-consistency loss 76 3.1.3 Uniform loss 77 3.2 網路結構與訓練 79 3.2.1 生成網路之網路結構 79 3.2.2 判別網路之網路結構 83 3.3 實驗結果 85 3.3.1 資料庫與資料前處理 85 3.3.2 評估指標 86 3.3.3 過往技術比較 90 3.3.4 自我評估比較 116 第四章 結論與未來展望 120 參考文獻 121

    [1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems (NIPS), 2014.
    [2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, 86(11):2278–2324, 1998.
    [3] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (NIPS), 2012.
    [4] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proc. Int. Conf. Machine Learning (ICML), 2010.
    [5] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60(2), pp. 91-110, 2004.
    [6] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional neural networks,” in Proc. European Conference on Computer Vision (ECCV), 2014.
    [7] M. Zeiler, D. Krishnan, G. Taylor, and R. Fergus, “Deconvolutional Networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010.
    [8] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.
    [9] M. Abadi et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
    [10] Keras: https://keras.io/
    [11] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009.
    [12] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learning Representations (ICLR), 2015.
    [13] CS231n: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf
    [14] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015.
    [15] K. He, X. Zhang, S. Ren, and J. Sun , “Deep residual learning for image recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016.
    [16] G. Huang, Z. Liu, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017.
    [17] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
    [18] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
    [19] C. O. Augustus Odena and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” arXiv preprint arXiv:1610.09585, 2016.
    [20] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. “Generative adversarial text to image synthesis,” in Proc. Int. Conf. Machine Learning (ICML), 2016.
    [21] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. Metaxas. “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2017.
    [22] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. “Image-toimage translation with conditional adversarial networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017.
    [23] O. Ronneberger, P. Fischer, and T. Brox. “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015.
    [24] J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2017.
    [25] X. Chen, C. Xu, X. Yang, and D. Tao, “Attention-GAN for Object Transfiguration in Wild Images,” arXiv preprint arXiv:1803.06798, 2018.
    [26] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Machine Learning (ICML), 2017.
    [27] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of wasserstein gans,” arXiv preprint arXiv:1704.00028, 2017.
    [28] X. Mao, Q. Li, H. Xie, R. Y.K. Lau, Z. Wang, and S. P. Smolley, “Least squares generative adversarial networks,” arXiv preprint arXiv:1611.04076, 2016.
    [29] J. Johnson, A. Alahi, and F. Li, “Perceptual losses for real-time style transfer and super-resolution,” in Proc. European Conference on Computer Vision (ECCV), 2016.
    [30] C. Li and M. Wand, “Precomputed real-time texture synthesis with markovian generative adversarial networks,” in Proc. European Conference on Computer Vision (ECCV), 2016.
    [31] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
    [32] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick , “Microsoft coco: Common objects in context,” in Proc. European Conference on Computer Vision (ECCV), 2014.
    [33] Z. Liu, P. Luo, X. Wang, X. Tang , “Deep learning face attributes in the wild,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2015.
    [34] Q. Huynh-Thu and M. Ghanbari, “Scope of validity of PSNR in image/video quality assessment,” Electron. Lett., vol. 44, no. 13, pp. 800–801, 2008.
    [35] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error measurement to structural similarity,” IEEE Trans. Image Processing, vol. 13, Jan. 2004.
    [36] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford and, X. Chen, “Improved techniques for training gans,” arXiv preprint arXiv:1606.03498, 2016.
    [37] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016.
    [38] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proc. Int. Conf. Learning Representations (ICLR), 2015.
    [39] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb, “Learning from simulated and unsupervised images through adversarial training,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017.
    [40] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan, “Unsupervised pixel-level domain adaptation with generative adversarial networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017.

    QR CODE