多尺度神經網路結合空洞卷積的影像去模糊方法｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	葉品儀 Pin-Yi Yeh
論文名稱：	多尺度神經網路結合空洞卷積的影像去模糊方法 Multi-Scale Neural Network with Dilated Convolutions for Image Deblurring
指導教授：	花凱龍 Kai-Lung Hua
口試委員:	花凱龍 Kai-Lung Hua 陳駿丞 Jun-Cheng Chen 鍾國亮 Kuo-Liang Chung 楊傳凱 Chuan-Kai Yang 郭景明 Jing-Ming Guo
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	49
中文關鍵詞：	影像去模糊、動態場景去模糊、空洞卷積、多尺度網路架構
外文關鍵詞：	Blind Motion Deblurring, Convolutional Neural Network, Dilated Convolution, Multi-Scale Network
相關次數：	點閱：238 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

去模糊方法是影像處理領域中一個重要議題，也是一個具有挑戰性的影像處理問題。影像中的模糊大致可分為兩種，由物體移動造成的局部模糊和相機晃動、景深或失焦而成的全域模糊，當方向各異且大小不同的模糊在同一影像中產生時，會使去模糊這項任務變得更具挑戰性！在單一影像去模糊中，已有基於深度學習的方法成功解決部分問題，但是，去模糊的任務不僅僅是還原出更清晰的影像，執行效率也是相當重要的一環。在本論文中，我們基於空洞卷積和編碼與解碼器的對稱架構，提出了一種影像去模糊方法的新框架。參考了單影像超解析的方法，同樣是由低解析度逐步還原至高解析度的金字塔結構，輸入不同尺度的影像，有助於我們捕捉更多細節以利還原，搭配在語意分割中有相當顯著成果的編碼與解碼器架構，利用下採樣減少參數的使用，然而，為了修補下採樣丟失的部分訊息，我們使用空洞卷積來提升感受野，使用空洞卷積可以從特徵圖中捕獲更多的訊息，幫助重建更清晰的影像，此方法並不會額外地增加參數的使用，可以維持網路的複雜度。我們的神經網路不只可以輸出更精確的細節，在實驗速度上也有了明顯的提升，為了公平的評比，我們利用相同的標準數據庫進行訓練及測試，與其他最先進的深度學習去模糊方法相比，我們的方法還原了更清晰的全彩影像，運行時間也更加地迅速。

Several deep learning-based approaches are successful in single image deblurring, particularly, convolutional neural networks (CNN). Unlike traditional methods which try to estimate the blur kernel to extract the latent sharp image, CNN-based methods can directly find the mapping from the blurry input image to the latent sharp image. CNN usually has many layers to represent complex spatial relationships, and down-sampling layers are used to reduce the number of parameters (e.g., encoder-decoder architecture). However, down-sampling causes some spatial information to be lost, and this information could be useful in deblurring large regions. The receptive field is the spatial coverage of each feature, and increasing its value allows less loss of spatial information. We used dilated convolution to increase the receptive field of the features without increasing the number of parameters. Furthermore, the "coarse-to-fine" strategy is applied to the network to the blurry input image at different scales in this thesis. By using this strategy, we can progressively improve the outputs, and allow us to capture details from different scales, without adding more parameters. We show that the proposed model not only has better results with the state-of-the-art but also has faster execution time.

Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Pre-processing for Multi-scale . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.1 Encoder-Decoder Structure . . . . . . . . . . . . . . . . . . . . . 10
3.2.2 Dilated Convolution . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.3 Residual Learning . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Training and Testing Datasets . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Comparison with Other Methods . . . . . . . . . . . . . . . . . . . . . . 21
5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
vii
                                

[1] S. Nah, T. H. Kim, and K. M. Lee, “Deep multi-scale convolutional neural network for dynamic scene
deblurring,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3883–
3891, 2017.
[2] W. S. Lai, J. B. Huang, Z. Hu, N. Ahuja, and M. H. Yang, “A comparative study for single image blind
deblurring,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1701–
1709, 2016.
[3] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, “Deblurgan: Blind motion deblurring
using conditional adversarial networks,” in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 8183–8192, 2018.
[4] X. Tao, H. Gao, X. Shen, J. Wang, and J. Jia, “Scale-recurrent network for deep image deblurring,” in
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8174–8182, 2018.
[5] Y. Shuai, Y. Wang, Y. Peng, and Y. Xia, “Accurate image super-resolution using cascaded multicolumn
convolutional neural networks,” in IEEE International Conference on Multimedia and Expo
(ICME), pp. 1–6, 2018.
[6] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” International Conference
on Learning Representations (ICLR), 2016.
[7] H. Miao, W. Zhang, and J. Bai, “Aggregated dilated convolutions for efficient motion deblurring,” in
IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, 2018.
[8] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution
algorithms,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1964–
1971, IEEE, 2009.
[9] T. H. Kim and K. M. Lee, “Segmentation-free dynamic scene deblurring,” in IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 2766–2773, 2014.
[10] J. Pan, Z. Hu, Z. Su, and M.-H. Yang, “Deblurring text images via l0-regularized intensity and gradient
prior,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2901–2908,
2014.
[11] J. Pan, D. Sun, H. Pfister, and M.-H. Yang, “Blind image deblurring using dark channel prior,” in IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1628–1636, 2016.
[12] Y. Bahat, N. Efrat, and M. Irani, “Non-uniform blind deblurring by reblurring,” in IEEE International
Conference on Computer Vision (ICCV), pp. 3286–3294, 2017.
[13] A. Gupta, N. Joshi, C. L. Zitnick, M. Cohen, and B. Curless, “Single image deblurring using motion
density functions,” in European Conference on Computer Vision (ECCV), pp. 171–184, 2010.
[14] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce, “Non-uniform deblurring for shaken images,” International
Journal of Computer Vision (IJCV), pp. 168–186, 2012.
[15] J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural network for non-uniform motion
blur removal,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 769–
777, 2015.
[16] D. Gong, J. Yang, L. Liu, Y. Zhang, I. Reid, C. Shen, A. V. D. Hengel, and Q. Shi, “From motion blur to
motion flow: a deep learning solution for removing heterogeneous motion blur,” in IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), pp. 2319–2328, 2017.
[17] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,”
in International Conference on Medical Image Computing and Computer-Assisted Intervention
(MICCAI), pp. 234–241, 2015.
[18] X. Mao, C. Shen, and Y. B. Yang, “Image restoration using very deep convolutional encoder-decoder
networks with symmetric skip connections,” in Advances in Neural Information Processing Systems
(NIPS), pp. 2802–2810, 2016.
[19] K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks,” Welcome to
the Computing Research Repository (CoRR), 2016.
[20] P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell, “Understanding convolution
for semantic segmentation,” in IEEE Winter Conference on Applications of Computer Vision (WACV),
pp. 1451–1460, 2018.
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
[22] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
[23] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” International Conference on
Learning Representations (ICLR), 2015.
[24] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,”
in International Conference on Artificial Intelligence and statistics (AIStats), pp. 249–256, 2010.
[25] S. S. Girija, “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” Software
available from tensorflow. org, 2016.
[26] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard,
et al., “Tensorflow: A system for large-scale machine learning,” in USENIX Symposium on Operating
Systems Design and Implementation (OSDI), pp. 265–283, 2016.
[27] L. Xu, S. Zheng, and J. Jia, “Unnatural l0 sparse representation for natural image deblurring,” in IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1107–1114, 2013.
[28] J. Zhang, J. Pan, J. Ren, Y. Song, L. Bao, R. W. Lau, and M. H. Yang, “Dynamic scene deblurring using
spatially variant recurrent neural networks,” in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 2521–2529, 2018.
[29] S. Ramakrishnan, S. Pachori, A. Gangopadhyay, and S. Raman, “Deep generative filter for motion
deblurring,” in IEEE International Conference on Computer Vision (ICCV), pp. 2993–3000, 2017.
[30] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,”
in IEEE International Conference on Computer Vision (ICCV), pp. 764–773, 2017.
[31] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: The missing ingredient for fast
stylization,” arXiv preprint arXiv:1607.08022, 2016.
[32] A. Howard, M. Sandler, G. Chu, L. C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan,
et al., “Searching for mobilenetv3,” arXiv preprint arXiv:1905.02244, 2019.

全文公開日期 2024/07/30 (校內網路)
全文公開日期 2029/07/30 (校外網路)
全文公開日期 2029/07/30 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文