簡易檢索 / 詳目顯示

研究生: 丁寧
Ning Ting
論文名稱: 基於卷積神經網路以使用者簡單筆觸 進行影像修補及編輯之系統
CNN-based Image Inpainting and Editing with User’s Freehand Strokes
指導教授: 王乃堅
Nai-Jian Wang
口試委員: 蘇順豐
Shun-Feng Su
鍾順平
Shun-Ping Chung
呂學坤
Shyue-Kung Lu
郭景明
Jing-Ming Guo
王乃堅
Nai-Jian Wang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 74
中文關鍵詞: 深度學習影像修補類神經網路影像紋理Gated Convolution
外文關鍵詞: Deep Learning, Image Inpainting, Neural Network, Image Texture, Gated Convolution
相關次數: 點閱:266下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

隨著近幾年來科技的快速進步,神經網路的相關技術又再一次地進到了人們的視野之中,受惠於硬體設備的快速升級及GPU的普及化,使得訓練及實作神經網路不再如以前那樣複雜及高成本。在有關經網路技術的相關應用之中,機器視覺尤其獲得了突破性的發展,透過神經網路實現了跳躍性的進步。
本論文實作了一個可根據使用者給予的資訊來編輯並修復影像的系統,透過給予筆跡資訊來引導類神經網路進行影像的生成,使生成出的影像能符合使用者的需求或將一張有缺失或汙損的影像恢復成原樣。本方法中為求影像構圖的完整性及準確性因此使用了類似於U-Net的結構做為主要的網路架構,透過學習多尺度特徵使結果更加優良,並且使用Place_365及CelebA_HQ兩個資料集來分別訓練場景影像及人臉影像。在本論文中為了改善影像修補不易處理不規則區塊的困難性,因此加入了Gated Convolution使得整個架構對於不規則的待修補區域有較良好的適應性。
實驗結果顯示本論文的方法可成功根據使用者輸入條件並完成影像編輯和修補的作業,生成出符合人類視覺評估的影像。我們也利用影像品質評估方式來作為我們的評斷指標,我們分別在結構相似性指標(Structure Similarity)和峰值訊噪比(Peak Signal-to-Noise Ratio)分別獲得了0.911及27.2dB的成績,比起其他研究所提出的影像修補方法都有差不多甚至更好的成績。


With the ever-changing nature of technology, neural networks have once again been widely discussed, benefit from the rapid progress of the hardware equipments and the popularization of GPUs, which have enabled training and implementation of neural networks. It is no longer as complicated and costly as before.
Among the related applications of neural networks, computer vision in particular has achieved a breakthrough development, and has achieved leapfrog progress through using neural networks.
We present an image edit and inpaint system that can synthesize images based on users’ inputs. By drawing textures or lines, the image can be edited to the needs of user's desire and can also be used to restore missing or defaced images to its original appearance. In our implementation, a U-Net liked network was used as the main network architecture for the completeness and accuracy of image composition, we use two kinds of datasets to train with our network. Place_365 for the scenery data and CelebA_HQ for the portraits. In this paper, due to the difficulty of repairing irregular holes on images. Gated Convolution is being used to solve this problem and makes the entire results better for irregular masks.
Experimental results show that our method can successfully generates images that meet the requirements of the human visual evaluation according to the user input conditions and complete the image editing and repairing tasks. Our approach gets 0.911 in structure similarity and 27.2dB in Peak signal-to-noise ratio. Our results match or outperform the performance of the state-of-the-art approaches on the problems of image inpainting.

摘要 I Abstract II 誌謝 III 目錄 IV 圖目錄 VII 表目錄 IX 第一章 緒論 1 1.1研究背景 1 1.2文獻回顧 2 1.3研究動機及目標 3 1.4論文架構 4 第二章 類神經網路探討 5 2.1人工神經網路(ANN) 5 2.1.1 神經元(Neurons) 6 2.1.2前向傳播(Forward Propagation) 6 2.1.3 激勵函數(Activation Function) 7 2.1.4 反向傳播(Back Propagayion) 9 2.1.5 損失函數(Loss Function) 9 2.2卷積神經網路(CNN) 10 2.2.1卷積層(Convolution Layer) 10 2.2.2池化層(Pooling Layer) 11 2.3網路架構 12 2.3.1 U-Net 12 2.3.2 Nested U-Net 12 2.3.3 VGG-16 13 2.4優化演算法 14 2.5針對修補區域的處理 14 第三章 影像修補及編輯之系統 17 3.1系統流程及網路架構 17 3.1.1網路架構 18 3.1.2網路流程 18 3.2系統訓練與優化 20 3.2.1 Gated Convolution for Masks 20 3.2.2批量正規化(Batch Normalization) 21 3.2.3 Adam優化演算法(Adaptive Moment Estimation) 22 3.2.4激勵函數(Activation Function) 23 3.3損失函數(Loss Functions) 23 3.3.1 Per-pixel Loss 23 3.3.2 Total Variation Loss 24 3.3.3 Perceptual Loss 24 3.3.4 Style Loss 25 3.3.5損失函數組合 26 第四章 實驗數據及結果分析 27 4.1實驗環境規格 27 4.2數據和資料集 27 4.2.1 Place365_Standard 27 4.2.2 CelebA_HQ 28 4.3前處理 29 4.3.1 遮罩生成 29 4.3.2 影像邊緣生成(模擬手繪軌跡) 31 4.4實驗結果及分析 34 4.4.1結果評分方式 34 4.4.2使用不同損失函數對結果造成的影響 35 4.4.3使用不同架構但相同方法對結果造成的影響 37 4.4.4使用不同的遮罩更新策略對結果造成的影響 39 4.4.5本系統應用在影像修補及影像編輯上的成果 40 4.4.6本系統與其他方法成果比較 54 4.4.7延伸應用:提高解析度任務 56 第五章 結論與未來研究方向 58 5.1結論 58 5.2未來研究方向 59 參考資料 60

[1] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computerassisted Intervention, pp. 234–241, Springer, 2015.
[2] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 million image database for scene recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 6, pp. 1452–1464, 2017.
[3] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
[4] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Free-form image inpainting with gated convolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4471–4480, 2019.
[5] A. Criminisi, P. P´erez, and K. Toyama, “Region filling and object removal by exemplar-based image inpainting,” IEEE Transactions on Image Processing, vol. 13, no. 9, pp. 1200–1212, 2004.
[6] K. He and J. Sun, “Statistics of patch offsets for image completion,” in European Conference on Computer Vision (ECCV), pp. 16–29, Springer, 2012.
[7] J. Sun, L. Yuan, J. Jia, and H.-Y. Shum, “Image completion with structure propagation,” in ACM SIGGRAPH 2005 Papers, pp. 861–868, 2005.
[8] S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Globally and locally consistent image completion,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, pp. 1–14, 2017.
[9] G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro, “Image inpainting for irregular holes using partial convolutions,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 85–100, 2018.
[10] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Generative image inpainting with contextual attention,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5505–5514, 2018.
[11] Z. Yan, X. Li, M. Li, W. Zuo, and S. Shan, “Shift-net: Image inpainting via deep feature rearrangement,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 1–17, 2018.
[12] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in International Conference on Machine Learning, 2010.
[13] F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412, 2018.
[14] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested u-net architecture for medical image segmentation,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11, Springer, 2018.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[16] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning, pp. 448–456, PMLR, 2015.
[17] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[18] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and superresolution,” in European Conference on Computer Vision, pp. 694–711, Springer, 2016.
[19] M. S. Sajjadi, B. Scholkopf, and M. Hirsch, “Enhancenet: Single image super- resolution through automated texture synthesis,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4491–4500, 2017.
[20] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009.
[21] S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1395–1403, 2015.
[22] C. Ledig, L. Theis, F. Husz´ar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
[23] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144, 2017.

無法下載圖示 全文公開日期 2026/08/04 (校內網路)
全文公開日期 2026/08/04 (校外網路)
全文公開日期 2026/08/04 (國家圖書館:臺灣博碩士論文系統)
QR CODE