屬性生成對抗網路之模組化架構-應用於圖像轉換｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳孔豪 Kung-Hao, Chen
論文名稱：	屬性生成對抗網路之模組化架構-應用於圖像轉換 Modularized Architecture of Attribute Generative Adversarial Network for Image Translation
指導教授：	林伯慎 Bor-Shen Lin
口試委員:	林伯慎 Bor-Shen Lin 楊傳凱 Chuan-Kai Yang 羅乃維 Nai-Wei Lo
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	47
中文關鍵詞：	生成對抗網路、模組化架構、多任務圖像轉換
外文關鍵詞：	GAN, Modularized Architecture, Multi-task Image Translation
相關次數：	點閱：163 下載：19
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

近年來，生成對抗網路在圖像轉換上取得重大的成功，其中的循環式生成對抗網路實現了領域間轉換，所衍生的許多種變化模型已可實現多任務圖像轉換。然而，多任務圖像轉換必須同時對所有屬性進行對抗訓練，這往往需耗費大量訓練時間。這也會使得，當有某些屬性轉換效果不理想時、或有新增屬性需求時，不可能針對特定屬性進行評估，而必須將所有網路重新訓練。這些限制會對多任務的圖像轉換系統的維護、改進、和應用彈性會造成瓶頸。為了改善此問題，本研究提出了一種新穎且具模組化架構的多任務圖像轉換網路，它包含了生成器、對抗鑑別器、以及以數個獨立且可以預訓練的屬性鑑別器。由於各個屬性鑑別器是獨立訓練，更容易對個別屬性轉換效果進行追蹤及改進，並可使用預訓練的骨架網路進行重估，以減少訓練時間。我們以實驗驗證了使用模組化架構和預訓練，可以在卡通圖像轉換上，生成和屬性對抗生成網路(AttGAN)品質相當的圖像。我們也在真實人臉圖像轉換任務上，成功驗證了以模組化架構進行訓練與轉換的可行性。另外，基於模組化架構，本研究也提出遞增式訓練方法。在新增的屬性時，系統可先載入已訓練好的網路參數，並進行對抗訓練以更新網路參數，而不需要將整個網路全部重新訓練。其中，屬性鑑別器皆為預先訓練好的模型，在對抗訓練時只協助訓練生成器，本身不更新參數。實驗結果顯示，在新增屬性時遞增式訓練方法能加快收斂，有效提升訓練效率，並且能生成與傳統多任務圖像轉換相似品質的圖像。

In recent years, generative adversarial networks have achieved great success on image translation. Cyclic generative adversarial network is such a network that can perform cross-domain image translation without pairing data, and its extensions such as attribute generative adversarial network (AttGAN) can transform among multiple domains. However, for multi-task image translation the networks need to be trained for all the attributes at the same time, which not only takes a lot of training time but makes it inconvenient to add a new attribute. In addition, when the quality of generated image for specific attribute is not satisfactory enough, it is hard to evaluate or improve the system since the whole network has to be retrained. Such limitations will form the bottlenecks for the maintenance and improvement of the system.
To deal with this issue, this research proposes a modularized architecture of multi-task image translation, which includes a generator, an adversarial discriminator and a few attribute discriminators that can be trained individually. Based on this architecture, it is easier to evaluate, track or improve the attribute translation, and such backbone networks as AlexNet may be utilized to reduce the training time. Experiment results show the modularized architecture with the pretrained models can achieve the image quality compatible to AttGAN when tested on cartoon images and photos of real faces. In addition, based on the modularized architecture the incremental training approach was proposed. When a new attribute is added, the system can simply load the pretrained network parameters and update them through adversarial training, instead of training the whole network from scratch. Experiment results show that the incremental training can speed up the convergence and improve the training efficiency effectively while generating images with quality compatible to non-modularized network architectures.

摘   要    I
Abstract    II
致   謝    III
目   錄    IV
圖目錄    VII
表目錄    XI
第1章 緒論    1
1 研究背景與動機    1
2 研究貢獻    3
3 論文組織與架構    4
第2章 文獻回顧    5
1 卷積神經網路    5
2 生成對抗網路    7
2.1 對抗損失函數    7
3 條件式生成對抗網路    8
4 圖像轉換    9
4.1 Pix2pix    9
4.2 循環生成對抗網路(CycleGAN)    11
4.3 多領域圖像轉換(StarGAN)    12
4.4 屬性生成對抗網路(AttGAN)    14
第3章 模組化架構生成對抗網路    16
1 模組化架構    16
2 屬性鑑別器骨架網路    19
3 屬性鑑別器訓練流程    20
4 損失函數    21
4.1 Wasserstein對抗損失函數結合梯度懲罰    22
4.2 重建損失函數    22
4.3 屬性分類損失函數    22
4.4 模型目標損失函數    23
5 MAGAN完整訓練流程    23
6 遞增式訓練    24
第4章 實驗結果與分析    26
1 模型訓練參數    26
2 卡通頭像資料集    26
3 屬性鑑別器架構準確度    27
4 評估基線模型    28
5 MAGAN實驗結果    29
6 遞增式訓練實驗結果    30
第5章 結論與未來展望    33
參考文獻    34


                                

[1] M. Mirza and S. Osindero, "Conditional generative adversarial nets," arXiv preprint arXiv:1411.1784, 2014.
[2] I. J. Goodfellow et al., "Generative adversarial networks," arXiv preprint arXiv:1406.2661, 2014.
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
[4] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015: Springer, pp. 234-241.
[5] G. E. Hinton and R. S. Zemel, "Autoencoders, minimum description length, and Helmholtz free energy," Advances in neural information processing systems, vol. 6, pp. 3-10, 1994.
[6] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, "Image-to-image translation with conditional adversarial networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125-1134.
[7] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223-2232.
[8] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, "Stargan: Unified generative adversarial networks for multi-domain image-to-image translation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8789-8797.
[9] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[10] Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen, "Attgan: Facial attribute editing by only changing what you want," IEEE Transactions on Image Processing, vol. 28, no. 11, pp. 5464-5478, 2019.
[11] M. Liu et al., "STGAN: A unified selective transfer network for arbitrary image attribute editing," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3673-3682.
[12] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[13] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[14] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, "Improved training of wasserstein gans," arXiv preprint arXiv:1704.00028, 2017.
[15] M. Arjovsky, S. Chintala, and L. Bottou, "Wasserstein generative adversarial networks," in International conference on machine learning, 2017: PMLR, pp. 214-223.
[16] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.

簡易檢索 / 詳目顯示

相關論文