研究生: |
丁寧 Ning Ting |
---|---|
論文名稱: |
基於卷積神經網路以使用者簡單筆觸 進行影像修補及編輯之系統 CNN-based Image Inpainting and Editing with User’s Freehand Strokes |
指導教授: |
王乃堅
Nai-Jian Wang |
口試委員: |
蘇順豐
Shun-Feng Su 鍾順平 Shun-Ping Chung 呂學坤 Shyue-Kung Lu 郭景明 Jing-Ming Guo 王乃堅 Nai-Jian Wang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 74 |
中文關鍵詞: | 深度學習 、影像修補 、類神經網路 、影像紋理 、Gated Convolution |
外文關鍵詞: | Deep Learning, Image Inpainting, Neural Network, Image Texture, Gated Convolution |
相關次數: | 點閱:266 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著近幾年來科技的快速進步,神經網路的相關技術又再一次地進到了人們的視野之中,受惠於硬體設備的快速升級及GPU的普及化,使得訓練及實作神經網路不再如以前那樣複雜及高成本。在有關經網路技術的相關應用之中,機器視覺尤其獲得了突破性的發展,透過神經網路實現了跳躍性的進步。
本論文實作了一個可根據使用者給予的資訊來編輯並修復影像的系統,透過給予筆跡資訊來引導類神經網路進行影像的生成,使生成出的影像能符合使用者的需求或將一張有缺失或汙損的影像恢復成原樣。本方法中為求影像構圖的完整性及準確性因此使用了類似於U-Net的結構做為主要的網路架構,透過學習多尺度特徵使結果更加優良,並且使用Place_365及CelebA_HQ兩個資料集來分別訓練場景影像及人臉影像。在本論文中為了改善影像修補不易處理不規則區塊的困難性,因此加入了Gated Convolution使得整個架構對於不規則的待修補區域有較良好的適應性。
實驗結果顯示本論文的方法可成功根據使用者輸入條件並完成影像編輯和修補的作業,生成出符合人類視覺評估的影像。我們也利用影像品質評估方式來作為我們的評斷指標,我們分別在結構相似性指標(Structure Similarity)和峰值訊噪比(Peak Signal-to-Noise Ratio)分別獲得了0.911及27.2dB的成績,比起其他研究所提出的影像修補方法都有差不多甚至更好的成績。
With the ever-changing nature of technology, neural networks have once again been widely discussed, benefit from the rapid progress of the hardware equipments and the popularization of GPUs, which have enabled training and implementation of neural networks. It is no longer as complicated and costly as before.
Among the related applications of neural networks, computer vision in particular has achieved a breakthrough development, and has achieved leapfrog progress through using neural networks.
We present an image edit and inpaint system that can synthesize images based on users’ inputs. By drawing textures or lines, the image can be edited to the needs of user's desire and can also be used to restore missing or defaced images to its original appearance. In our implementation, a U-Net liked network was used as the main network architecture for the completeness and accuracy of image composition, we use two kinds of datasets to train with our network. Place_365 for the scenery data and CelebA_HQ for the portraits. In this paper, due to the difficulty of repairing irregular holes on images. Gated Convolution is being used to solve this problem and makes the entire results better for irregular masks.
Experimental results show that our method can successfully generates images that meet the requirements of the human visual evaluation according to the user input conditions and complete the image editing and repairing tasks. Our approach gets 0.911 in structure similarity and 27.2dB in Peak signal-to-noise ratio. Our results match or outperform the performance of the state-of-the-art approaches on the problems of image inpainting.
[1] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computerassisted Intervention, pp. 234–241, Springer, 2015.
[2] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 million image database for scene recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 6, pp. 1452–1464, 2017.
[3] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
[4] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Free-form image inpainting with gated convolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4471–4480, 2019.
[5] A. Criminisi, P. P´erez, and K. Toyama, “Region filling and object removal by exemplar-based image inpainting,” IEEE Transactions on Image Processing, vol. 13, no. 9, pp. 1200–1212, 2004.
[6] K. He and J. Sun, “Statistics of patch offsets for image completion,” in European Conference on Computer Vision (ECCV), pp. 16–29, Springer, 2012.
[7] J. Sun, L. Yuan, J. Jia, and H.-Y. Shum, “Image completion with structure propagation,” in ACM SIGGRAPH 2005 Papers, pp. 861–868, 2005.
[8] S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Globally and locally consistent image completion,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, pp. 1–14, 2017.
[9] G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro, “Image inpainting for irregular holes using partial convolutions,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 85–100, 2018.
[10] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Generative image inpainting with contextual attention,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5505–5514, 2018.
[11] Z. Yan, X. Li, M. Li, W. Zuo, and S. Shan, “Shift-net: Image inpainting via deep feature rearrangement,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 1–17, 2018.
[12] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in International Conference on Machine Learning, 2010.
[13] F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412, 2018.
[14] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested u-net architecture for medical image segmentation,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11, Springer, 2018.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[16] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning, pp. 448–456, PMLR, 2015.
[17] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[18] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and superresolution,” in European Conference on Computer Vision, pp. 694–711, Springer, 2016.
[19] M. S. Sajjadi, B. Scholkopf, and M. Hirsch, “Enhancenet: Single image super- resolution through automated texture synthesis,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4491–4500, 2017.
[20] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009.
[21] S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1395–1403, 2015.
[22] C. Ledig, L. Theis, F. Husz´ar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
[23] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144, 2017.