VDNet: 基於粗自適模組及可變形循環殘差網路之視頻去交錯網路

簡易檢索 / 詳目顯示

回結果列表

研究生：	葉胤呈 Yin-Chen Yeh
論文名稱：	VDNet: 基於粗自適模組及可變形循環殘差網路之視頻去交錯網路 VDNet: Video Deinterlacing Network Based on Coarse Adaptive Module and Deformable Recurrent Residual Network
指導教授：	花凱龍 Kai-Lung Hua
口試委員:	陳永耀 Yung-Yao Chen 項天瑞 Tien-Ruey Hsiang 郭景明 Jing-Ming Guo 鐘國亮 Kuo-Liang Chung 花凱龍 Kai-Lung Hua
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	54
中文關鍵詞：	去交錯、時空視頻超解析、視頻還原、深度學習
外文關鍵詞：	Deinterlacing, Space-time video super-resolution, Video restoration, Deep learning
相關次數：	點閱：202 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

去交錯旨在將交錯視頻恢復為逐行視頻，其中主要要將視覺上的偽影消除以及同時加倍幀數。近期的去交錯演算法在恢復影像時僅針對單張影像，而沒有最佳地利用視頻提供的時間資訊，為改善這種狀況，本文提出第一個考量交錯視頻中幀間時間訊息的基於深度學習之視頻去交錯框架，VDNet，本文將視頻去交錯視為由簡單且粗糙的方式生成之基底影像序列結合由特徵層面詳細產生的殘差影像序列，考慮到視頻去交錯中遺失的像素可以藉由空間或時間方向的鄰近像素輕易地產生，本文設計資料模組利用本文提出的粗自適模組來獲得比較可以依靠的基底影像序列，另一方面，為了提供一個比較穩定的殘差影像序列，本文設計殘差模組，利用本文提出的可變形循環殘差網路來最佳地加強、聚合從交錯視頻中萃取或合成的特徵，再重建完視頻後，本文提出的時空關係損失利用既有的交錯視頻資訊從空間或時間方向進一步加強去交錯的結果，大量實驗表明，本文所提的VDNet具有令人震驚定量結果，除此之外，本文提出的VDNet在參數量方面也有所斟酌，盡量避免所設計的網路需要繁重的負擔。

Deinterlacing is the method to restore interlaced videos to progressive videos, and the main topics are removing visual artifacts and doubling the number of frames. Recently deinterlacing approaches only focus on the single interlaced image and directly restore them without optimally leveraging the temporal information. To improve this situation, we propose a video deinterlacing framework, VDNet, which to the best of our knowledge, is the first deep learning-based deinterlacing framework considered the inter-frames correlation between interlaced video. We see video deinterlacing as the basic image sequence generated from the simple coarse method combines with the residual image sequence generated from the feature level cautious method. To consider the missing pixel in video deinterlacing can be easily interpolated from the spatial or temporal direction via the neighbor pixel, we design a data module that leverages our proposed Coarse Adaptive Module to obtain a reliable basic image sequence from these diverse basic image sequences. To provide a stable residual image sequence, we design a residual module that leverages our proposed Deformable Recurrent Residual Network to optimally enhance and aggregate the features extracted or synthesized from interlaced video. After the reconstruction, our proposed Spatial-Temporal Correlation Loss uses the information provided by the existing interlaced video to further smooth and boost deinterlacing outcomes via spatial or temporal direction. Extensive experiments demonstrate that our proposed VDNet has incredible quantitative performance. Moreover, We take a cautious trade-off between the parameters of the entire VDNet, try our best to avoid the massive burden of our network.

Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . .i
Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . .ii
Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . .iii
Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . .iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . .v
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ix
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Related Work  . . . . . . . . . . . . . . . . . . . . . . . . . . .4
1  Deinterlacing . . . . . . . . . . . . . . . . . . . . . . . .4
2  Video Super­-Resolution . . . . . . . . . . . . . . . . . . .5
3  Space­Time Video Super­-Resolution . . . . . . . . . . . .6
Method  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
1  Data Module  . . . . . . . . . . . . . . . . . . . . . . . .9
1.1  Diverse Basic Image Sequences . . . . . . . . . .10
1.2  Coarse Adaptive Module . . . . . . . . . . . . . .11
2  Residual Module . . . . . . . . . . . . . . . . . . . . . .14
2.1  Feature Extraction . . . . . . . . . . . . . . . . .15
2.2  Feature Temporal Interpolation . . . . . . . . . . .16
2.3  Temporal Aggregation and Reconstruction  . . . .17
3  Frame Reconstruction . . . . . . . . . . . . . . . . . . . .21
3.1  Reconstruction Loss . . . . . . . . . . . . . . . .21
3.2  Perceptual Loss . . . . . . . . . . . . . . . . . . .22
3.3  Spatial­-Temporal Correlation loss . . . . . . . . .22
4  Implement Detail . . . . . . . . . . . . . . . . . . . . . .24
Experiment  . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
1  Experimental Setup . . . . . . . . . . . . . . . . . . . . .26
1.1  Dataset . . . . . . . . . . . . . . . . . . . . . . .26
1.2  Evaluation  . . . . . . . . . . . . . . . . . . . . .27
2  Comparison to State-­of­-the-­art Methods . . . . . . . . . .27
3  Ablation Study  . . . . . . . . . . . . . . . . . . . . . . .31
3.1  Our architecture  . . . . . . . . . . . . . . . . . .32
3.2  Basic Image Sequence . . . . . . . . . . . . . . .33
3.3  Deformable RRN . . . . . . . . . . . . . . . . . .35
3.4  Different Loss  . . . . . . . . . . . . . . . . . . .36
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .37
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38
Letter of Authority  . . . . . . . . . . . . . . . . . . . . . . . . . .42
                                

[1] G. De Haan and E. B. Bellers, “Deinterlacing-an overview,” Proceedings of the IEEE, vol. 86, no. 9, pp. 1839–1857, 1998.
[2] T. Doyle, “Interlaced to sequential conversion for edtv applications,” in 2nd Int. Workshop Signal Processing of HDTV, 1998, pp. 412–430, 1998.
[3] J. Wang, G. Jeon, and J. Jeong, “Efficient adaptive deinterlacing algorithm with awareness of closeness and similarity,” Optical Engineering, vol. 51, no. 1, p. 017003, 2012.
[4] H. Zhu, X. Liu, X. Mao, and T.T. Wong, “Real-time deep video deinterlacing,” arXiv preprint arXiv:1708.00187, 2017.
[5] Y. Zhao, W. Jia, R. Wang, X. Liu, X. Gao, W. Chen, and W. Gao, “Deinterlacing network for early interlaced videos,” arXiv preprint arXiv:2011.13675, 2020.
[6] H. Lee, T. Kim, T.y. Chung, D. Pak, Y. Ban, and S. Lee, “Adacof: adaptive collaboration of flows for video frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5316–5325, 2020.
[7] X. Wang, K. C. Chan, K. Yu, C. Dong, and C. C. Loy, “Edvr: Video restoration with enhanced deformable convolutional networks,” in The IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 6 2019.
[8] X. Xiang, Y. Tian, Y. Zhang, Y. Fu, J. P. Allebach, and C. Xu, “Zooming slow-mo: Fast and accurate one-stage space-time video super-resolution,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3370–3379, 2020.
[9] G. Xu, J. Xu, Z. Li, L. Wang, X. Sun, and M. Cheng, “Temporal modulation network for controllable space-time video super-resolution,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6 2021.
[10] T. Isobe, F. Zhu, X. Jia, and S. Wang, “Revisiting temporal modeling for video super-resolution,” arXiv preprint arXiv:2008.05765, 2020.
[11] T. Isobe, S. Li, X. Jia, S. Yuan, G. Slabaugh, C. Xu, Y.L. Li, S. Wang, and Q. Tian, “Video super-resolution with temporal group attention,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8008–8017, 2020.
[12] P. Brox, I. Baturone, S. SanchezSolano, and J. GutierrezRios, “Edge-adaptive spatial video deinterlacing algorithms based on fuzzy logic,” IEEE Transactions on Consumer Electronics, vol. 60, no. 3, pp. 375–383, 2014.
[13] V. Jakhetiya, O. C. Au, S. Jaiswal, L. Jia, and H. Zhang, “Fast and efficient intra-frame deinterlacing using observation model based bilateral filter,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5819–5823, IEEE, 2014.
[14] J. Wang, G. Jeon, and J. Jeong, “A block-wise autoregression-based deinterlacing algorithm,” Journal of Display Technology, vol. 10, no. 5, pp. 414-419, 2013.
[15] J. Wang, G. Jeon, and J. Jeong, “De-interlacing algorithm using weighted least squares,” IEEE transactions on circuits and systems for video technology, vol. 24, no. 1, pp. 39–48, 2013.
[16] J. Wang, G. Jeon, and J. Jeong, “Moving least-squares method for interlaced to progressive scanning format conversion,” IEEE transactions on circuits and systems for video technology, vol. 23, no. 11, pp. 1865–1872, 2013.
[17] J. Caballero, C. Ledig, A. Aitken, A. Acosta, J. Totz, Z. Wang, and W. Shi, “Real-time video super-resolution with spatio-temporal networks and motion compensation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4778–4787, 2017.
[18] X. Tao, H. Gao, R. Liao, J. Wang, and J. Jia, “Detail-revealing deep video super-resolution,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4472–4480, 2017.
[19] L. Wang, Y. Guo, Z. Lin, X. Deng, and W. An, “Learning for video super-resolution through hr optical flow estimation,” in Asian Conference on Computer Vision, pp. 514–529, Springer, 2018.
[20] M. S. Sajjadi, R. Vemulapalli, and M. Brown, “Frame-recurrent video super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6626–6634, 2018.
[21] T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” International Journal of Computer Vision, vol. 127, no. 8, pp. 1106-1125, 2019.
[22] M. Haris, G. Shakhnarovich, and N. Ukita, “Recurrent back-projection network for video super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3897–3906, 2019.
[23] Y. Jo, S. W. Oh, J. Kang, and S. J. Kim, “Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3224–3232, 2018.
[24] Y. Tian, Y. Zhang, Y. Fu, and C. Xu, “Tdan: Temporally-deformable alignment network for video superresolution,” in Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition, pp. 3360–3369, 2020.
[25] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” Advances in neural information processing systems, vol. 28, 2015.
[26] B. Lim and K. M. Lee, “Deep recurrent resnet for video super-resolution,” in 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1452–1455, IEEE, 2017.
[27] Y. Huang, W. Wang, and L. Wang, “Video super-resolution via bidirectional recurrent convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 1015–1028, 2017.
[28] T. Isobe, X. Jia, S. Gu, S. Li, S. Wang, and Q. Tian, “Video super-resolution with recurrent structure-detail network,” in European Conference on Computer Vision, pp. 645–660, Springer, 2020.
[29] M. Haris, G. Shakhnarovich, and N. Ukita, “Space-time-aware multi-resolution video enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2859-2868, 2020.
[30] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 2758–2766, 2015.
[31] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 764–773, 2017.
[32] X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9308-9316, 2019.
[33] W.S. Lai, J.B. Huang, N. Ahuja, and M.H. Yang, “Deep laplacian pyramid networks for fast and accurate super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 624–632, 2017.
[34] J. Bruna, P. Sprechmann, and Y. LeCun, “Super-resolution with deep convolutional sufficient statistics,” Jan. 2016. 4th International Conference on Learning Representations, ICLR 2016 ; Conference date: 02-05-2016 Through 04-05-2016.
[35] L. Gatys, A. S. Ecker, and M. Bethge, “Texture synthesis using convolutional neural networks,” in Advances in Neural Information Processing Systems, vol. 28, 2015.
[36] J. Johnson, A. Alahi, and L. FeiFei, “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision, pp. 694-711, Springer, 2016.
[37] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
[38] A. Chadha, J. Britto, and M. M. Roja, “iseebetter: Spatio-temporal video super-resolution using recurrent generative back-projection networks,” Computational Visual Media, vol. 6, no. 3, pp. 307–317, 2020.
[39] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[40] H. A. Aly and E. Dubois, “Image up-sampling using total-variation regularization with a new observation model,” IEEE Transactions on Image Processing, vol. 14, no. 10, pp. 1647–1659, 2005.
[41] Z. Wang, J. Chen, and S. C. Hoi, “Deep learning for image superresolution: A survey,” IEEE transactions on pattern analysis and machine intelligence, 2020.
[42] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[43] I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2016.
[44] C. Liu and D. Sun, “A bayesian approach to adaptive video super resolution,” in CVPR 2011, pp. 209–216, IEEE, 2011.
[45] X. Tao, H. Gao, R. Liao, J. Wang, and J. Jia, “Detail-revealing deep video super-resolution,” in The IEEE International Conference on Computer Vision (ICCV), 10 2017.
[46] P. Yi, Z. Wang, K. Jiang, J. Jiang, and J. Ma, “Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3106-3115, 2019.
[47] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.

全文公開日期 2026/09/06 (校內網路)
全文公開日期 2026/09/06 (校外網路)
全文公開日期 2026/09/06 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文