簡易檢索 / 詳目顯示

研究生: 蔡奕德
I-Te Tsai
論文名稱: 快速且穩定地從單圖像中學習生成模型
Fast Learning a Stabilized Generative Model from a Single Image
指導教授: 花凱龍
Kai-Lung Hua
口試委員: 陳駿丞
Jun-Cheng Chen
陳永耀
Yung-Yao Chen
陸敬互
Ching-Hu Lu
楊傳凱
Chuan-kai Yang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 40
中文關鍵詞: 無條件圖像生成圖像生成膨脹內捲
外文關鍵詞: Unconditional Image Generation, Image Generation, Dilated Involution
相關次數: 點閱:419下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在擁有大規模數據集的情況下,生成對抗網路在圖像合成任務中取得了良好的成效。然而我們無法保證在各種狀態下都能夠蒐集到足夠大量的數據,因此若能從單一圖像中學習一個生成模型便能大幅解決資料不足的問題。只用單一圖像訓練生成對抗網路是一個困難的問題,因為過度擬合與訓練發散的狀況經常發生。在本文中,我們提出了一種新的運算子"擴張內捲",它防止了信道冗餘的問題,並且能夠適應每個位置的局部信息。我們還提出了一種自我監督判別器,透過圖像重建函數,防止生成的圖像發散。與之前的方法相比,我們的方法以更少的參數和一半的訓練時間實現了最先進的性能。


    Generative Adversarial Networks (GANs) achieve good results through large-scale datasets. However, collecting such datasets is challenging. Therefore, enabling GAN models to learn from a few or single images without overfitting is essential. This paper proposes the dilated involution operator, which prevents channel redundancy/ learning redundant features while adapting to local information at each location. We also propose a self-supervised discriminator that doubles as a reconstruction function preventing the generated image from diverging too much. Our method achieves state-of-the-art performance with fewer parameters and half the training time.

    Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . iii Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Single Image Generative Models . . . . . . . . . . 4 2.2 Few-Shot Generative Models . . . . . . . . . . . 5 3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1 Multi-scale training . . . . . . . . . . . . . . . . . 6 3.2 Dilated involution . . . . . . . . . . . . . . . . . 9 3.3 Self-supervised discriminator . . . . . . . . . . . 12 4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1 Quantitative Evaluation . . . . . . . . . . . . . . 15 4.2 Ablation Study . . . . . . . . . . . . . . . . . . . 22 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Letter of Authority . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    [1] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134, 2017.
    [2] T. Hinz, M. Fisher, O. Wang, and S. Wermter, “Improved techniques for training single-image gans,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1300–1309, 2021.
    [3] T. R. Shaham, T. Dekel, and T. Michaeli, “Singan: Learning a generative model from a single natural image,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4570–4580, 2019.
    [4] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410, 2019.
    [5] T. Hinz, S. Heinrich, and S. Wermter, “Generating multiple objects at spatially distinct locations,” arXiv preprint arXiv:1901.00686, 2019.
    [6] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134, 2017.
    [7] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, pp. 2223–2232, 2017.
    [8] U. Demir and G. Unal, “Patch-based image inpainting with generative adversarial networks,” 2018.
    [9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
    [10] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “Ghostnet: More features from cheap operations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589, 2020.
    [11] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.
    [12] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” arXiv preprint arXiv:1608.08710, 2016.
    [13] Y. Nie, K. Han, Z. Liu, A. Xiao, Y. Deng, C. Xu, and Y. Wang, “Ghostsr: Learning ghost features for efficient image super-resolution,” arXiv preprint arXiv:2101.08525, 2021.
    [14] J. Xiao, W. Jia, and K.-M. Lam, “Feature redundancy mining: Deep light-weight image super-resolution model,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1620–1624, IEEE, 2021.
    [15] D. Li, J. Hu, C. Wang, X. Li, Q. She, L. Zhu, T. Zhang, and Q. Chen, Involution: Inverting the inherence of convolution for visual recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12321–12330, 2021.
    [16] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
    [17] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9446–9454, 2018.
    [18] A. Shocher, N. Cohen, and M. Irani, ““zero-shot”super-resolution using deep internal learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3118–3126, 2018.
    [19] H. Zhang, L. Mai, N. Xu, Z. Wang, J. Collomosse, and H. Jin, “An internal learning approach to video inpainting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2720–2729, 2019.
    [20] M. Zontak, I. Mosseri, and M. Irani, “Separating signal from noise using patch recurrence across scales,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1195–1202, 2013.
    [21] T. Michaeli and M. Irani, “Blind deblurring using internal patch recurrence,” in European conference on computer vision, pp. 783–798, Springer, 2014.
    [22] I. D. Mastan and S. Raman, “Multi-level encoder-decoder architectures for image restoration,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1728–1737, IEEE, 2019.
    [23] I. D. Mastan and S. Raman, “Dcil: Deep contextual internal learning for image restoration and image retargeting,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2366–2375, 2020.
    [24] Y. Gandelsman, A. Shocher, and M. Irani, “” double-dip”: Unsupervised image decomposition via coupled deep-image-priors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11026–11035, 2019.
    [25] W. Yang, X. Zhang, Y. Tian, W. Wang, J.-H. Xue, and Q. Liao, “Deep learning for single image super-resolution: A brief review,” IEEE Transactions on Multimedia, vol. 21, no. 12, pp. 3106–3121, 2019.
    [26] J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5197–5206, 2015.
    [27] D. Glasner, S. Bagon, and M. Irani, “Super-resolution from a single image,” in 2009 IEEE 12th international conference on computer vision, pp. 349–356, IEEE, 2009.
    [28] S. Bell-Kligler, A. Shocher, and M. Irani, “Blind super-resolution kernel estimation using an internal-gan,” arXiv preprint arXiv:1909.06581, 2019.
    [29] T. S. Cho, M. Butman, S. Avidan, and W. T. Freeman, “The patch transform and its applications to image editing,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, IEEE, 2008.
    [30] T. Dekel, T. Michaeli, M. Irani, and W. T. Freeman, “Revealing and modifying non-local variations in a single image,” ACM Transactions on Graphics (TOG), vol. 34, no. 6, pp. 1–11, 2015.
    [31] K. He and J. Sun, “Statistics of patch offsets for image completion,” in European conference on computer vision, pp. 16–29, Springer, 2012.
    [32] R. Mechrez, E. Shechtman, and L. Zelnik-Manor, “Saliency driven image manipulation,” Machine Vision and Applications, vol. 30, no. 2, pp. 189–202, 2019.
    [33] T. Tlusty, T. Michaeli, T. Dekel, and L. Zelnik-Manor, “Modifying non-local variations across multiple views,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,pp. 6276–6285, 2018.
    [34] J. Mao, X. Zhang, Y. Li, W. T. Freeman, J. B. Tenenbaum, and J. Wu, “Program-guided image manipulators,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV),October 2019.
    [35] N. Jetchev, U. Bergmann, and R. Vollgraf, “Texture synthesis with spatial generative adversarial networks,” arXiv preprint arXiv:1611.08207, 2016.
    [36] Y. Zhou, Z. Zhu, X. Bai, D. Lischinski, D. Cohen-Or, and H. Huang, “Non-stationary texture synthesis by adversarial expansion,” arXiv preprint arXiv:1805.04487, 2018.
    [37] U. Bergmann, N. Jetchev, and R. Vollgraf, “Learning texture manifolds with the periodic spatial gan,” arXiv preprint arXiv:1705.06566, 2017.
    [38] C. Li and M. Wand, “Precomputed real-time texture synthesis with markovian generative adversarial networks,” in European conference on computer vision, pp. 702–716, Springer, 2016.
    [39] A. Shocher, S. Bagon, P. Isola, and M. Irani, “Ingan: Capturing and retargeting the” dna” of a natural image,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4492–4501, 2019.
    [40] S. Mo, M. Cho, and J. Shin, “Freeze the discriminator: a simple baseline for fine-tuning gans,” arXiv preprint arXiv:2002.10964, 2020.
    [41] Y. Wang, A. Gonzalez-Garcia, D. Berga, L. Herranz, F. S. Khan, and J. v. d. Weijer, “Minegan: effective knowledge transfer from gans to target domains with few images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9332–9341, 2020.
    [42] M. Frid-Adar, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan, “Synthetic data augmentation using gan for improved liver lesion classification,” in 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp. 289–293, IEEE, 2018.
    [43] G. Mariani, F. Scheidegger, R. Istrate, C. Bekas, and C. Malossi, “Bagan: Data augmentation with balancing gan,” arXiv preprint arXiv:1803.09655, 2018.
    [44] T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila, “Training generative adversarial networks with limited data,” arXiv preprint arXiv:2006.06676, 2020.
    [45] B. Liu, Y. Zhu, K. Song, and A. Elgammal, “Towards faster and stabilized gan training for high-fidelity few-shot image synthesis,” in International Conference on Learning Representations, 2020.
    [46] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of wasserstein gans,” arXiv preprint arXiv:1704.00028, 2017.
    [47] B.Zhou, A.Lapedriza, J.Xiao, A.Torralba, andA.Oliva, Learning deep features for scene recognition using places database. Neural Information Processing Systems Foundation, 2014.
    [48] F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao, “Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop,” arXiv preprint arXiv:1506.03365, 2015.
    [49] R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in European conference on computer vision, pp. 649–666, Springer, 2016.

    無法下載圖示 全文公開日期 2027/01/26 (校內網路)
    全文公開日期 2027/01/26 (校外網路)
    全文公開日期 2027/01/26 (國家圖書館:臺灣博碩士論文系統)
    QR CODE