簡易檢索 / 詳目顯示

研究生: 林鼎傑
Ding-Jie Lin
論文名稱: 基於 Patch 的自適應解析度去背模型
Patch-based Adaptive Resolution Image Matting
指導教授: 戴文凱
Wen-Kai Tai
口試委員: 紀明德
Ming-Te Chi
金台齡
Tai-Lin Chin
戴文凱
Wen-Kai Tai
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 49
中文關鍵詞: 深度學習影像去背影像合成自適應解析度基於圖片塊的學習
外文關鍵詞: Deep learning, Image Matting, Compositing, Adaptive resolution, Patch-based learning
相關次數: 點閱:304下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

Image Matting已被廣泛地應用於圖片和影片的編輯與合成中。但現今的拍攝設備輸出的圖片大多都達幾千萬或幾億畫素。現有的去背應用對於高解析度圖片大多是先將高解析度圖片進行降採樣再去背,沒有直接使用原解析度圖片。而我們想要直接對高解析度圖片進行去背,因此採用切塊方式處理,即採用Patch-based的去背模型。

在本文中,我們提出了一個新的Patch-based的Image Matting模型,並將Image Matting模型中常用的Max Unpooling層改為我們提出的Residual Max Unpooling層。Residual Max Unpooling層具有Max Unpooling層保留特徵位置的能力外,還能有效利用Max Pooling層丟掉的特徵。此外,我們的模型還搭配了ResizedNet補充基於Patch-based的模型缺乏的遠距離資訊,還利用ShallowNet補充在模型深層時,失去的淺層資訊。

對於本文中提出的Residual Max Unpooling層,我們透過比較使用原始Max Unpooling和Bilinear兩種上採樣方式,展示我們提出的Residual Max Unpooling層能夠有效的提升模型效能。最後我們將提出的模型與現有的去背模型GCA-Matting和HDMatt進行比較,並在合成圖片上取得優異的結果。也透過真實世界圖片驗證,我們在合成圖片上訓練的模型,也有很好的去背效果。進一步證明了,我們的方法對於真實世界圖片具有一定的泛化能力。


Image Matting has been widely applied in the editing and composition of images and videos. However, most of the pictures output by today's photo equipment can reach tens of millions or even hundreds of millions of pixels. Existing background removal applications usually downsample the high-resolution images before performing the matting process, instead of directly using the original resolution images. We aim to remove the background directly from high-resolution images, so we adopt a patch-based matting model.

In this thesis, we propose a new Patch-based Image Matting model and replace the commonly used Unpooling layer in the Image Matting model with our proposed Residual Max Unpooling layer. The Residual Max Unpooling layer not only preserves the feature positions like the Max Unpooling layer but also can effectively use the features lost by the Max Pooling layer. In addition, our model uses ResizedNet to supplement the long-distance information not included in the Patch-based model, and also uses ShallowNet to supplement the shallow information lost in the deep layers of the model.

We have shown that our proposed Residual Max Unpooling layer can effectively improve model performance by comparing the original Max Unpooling and Bilinear methods. Finally, we compare the proposed model with existing models, such as GCA-Matting and HDMatt, and achieve excellent results on composite images.
We also validated our approach on real-world images, demonstrating that the model trained on composite images also exhibits effective matting results. That further proves the generalization ability of our model for real-world images.

論文摘要 III ABSTRACT IV 誌謝 V 目錄 VI 圖目錄 VIII 表目錄 XI 1 緒論 1 1.1 研究動機和目標 1 1.2 研究方法敘述 1 1.3 研究貢獻 2 1.4 本論文之章節結構 2 2 文獻探討 3 2.1 Image Matting 3 2.2 Attention 4 2.3 High-Resolution 5 3 研究方法 7 3.1 Overview 7 3.2 Patches 的分割與合併 7 3.3 Residual Max Unpooling (ResMaxUnpool) 8 3.4 Image Matting 9 3.5 訓練設定 10 4 實驗設計與結果 12 4.1 Datasets 12 4.2 評估指標 14 4.3 Ablation Study 14 5 結論與後續工作 33 5.1 貢獻與結論 33 5.2 未來工作 33 參考文獻 34

[1] T. Wei, D. Chen, W. Zhou, J. Liao, H. Zhao, W. Zhang, and N. Yu, “Improved image matting via real-time user clicks and uncertainty estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15374–15383, 2021.
[2] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
[3] H. Lu, Y. Dai, C. Shen, and S. Xu, “Indices matter: Learning to index for deep image matting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3266–3275, 2019.
[4] H. Yu, N. Xu, Z. Huang, Y. Zhou, and H. Shi, “High-resolution deep image matting,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3217–3224, May 2021.
[5] Y. Li and H. Lu, “Natural image matting via guided contextual attention,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11450–11457, Apr. 2020.
[6] PicWish, “Online background remover 100% free.” https://picwish.com/.
[7] REMOVAL.AI LTD., “Background remover: Create transparent background.” https://removal.ai/, may 2022.
[8] Kaleido, “Remove Background from Image for Free – remove.bg.” https://www.remove.bg/.
[9] Adobe Inc., “Adobe Photoshop.” https://www.adobe.com/products/photoshop.html.
[10] D. Cho, Y.-W. Tai, and I. Kweon, “Natural image matting using deep convolutional neural networks,” in European Conference on Computer Vision, pp. 626–643, Springer, 2016.
[11] N. Xu, B. Price, S. Cohen, and T. Huang, “Deep image matting,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2970–2979, 2017.
[12] Q. Chen, T. Ge, Y. Xu, Z. Zhang, X. Yang, and K. Gai, “Semantic human matting,” in Proceedings of the 26th ACM International Conference on Multimedia, pp. 618–626, 2018.
[13] Y. Zhang, L. Gong, L. Fan, P. Ren, Q. Huang, H. Bao, and W. Xu, “A late fusion cnn for digital matting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7469–7478, 2019.
[14] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
[15] Q. Liu, H. Xie, S. Zhang, B. Zhong, and R. Ji, “Long-range feature propagating for natural image matting,” in Proceedings of the 29th ACM International Conference on Multimedia, pp. 526–534, 2021.
[16] R. Wang, J. Xie, J. Han, and D. Qi, “Improving deep image matting via local smoothness assumption,” in 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, IEEE, 2022.
[17] J. Wang and M. F. Cohen, “An iterative optimization approach for unified image segmentation and matting,” in Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, vol. 2, pp. 936–943, IEEE, 2005.
[18] H. Ding, H. Zhang, C. Liu, and X. Jiang, “Deep interactive image matting with feature propagation,” IEEE Transactions on Image Processing, vol. 31, pp. 2421–2432, 2022.
[19] N. Xu, B. Price, S. Cohen, J. Yang, and T. S. Huang, “Deep interactive object selection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 373–381, 2016.
[20] S. Yang, B. Wang, W. Li, Y. Lin, C. He, et al., “Unified interactive image matting,” arXiv preprint arXiv:2205.08324, 2022.
[21] J.-B. Cordonnier, A. Loukas, and M. Jaggi, “On the relationship between self-attention and convolutional layers,” arXiv preprint arXiv:1911.03584, 2019.
[22] Y. Liu, Q. Ren, J. Geng, M. Ding, and J. Li, “Efficient patch-wise semantic segmentation for large-scale remote sensing images,” Sensors, vol. 18, no. 10, p. 3232, 2018.
[23] G. Park, S. Son, J. Yoo, S. Kim, and N. Kwak, “Matteformer: Transformer-based image matting via prior-tokens,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11696–11706, 2022.
[24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[25] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009.
[26] D. Misra, “Mish: A self regularized non-monotonic activation function,” arXiv preprint arXiv:1908.08681, 2019.
[27] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2017.
[28] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in Neural Information Processing Systems, vol. 32, 2019.
[29] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[30] I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2016.
[31] J. Li, J. Zhang, and D. Tao, “Deep automatic natural image matting,” arXiv preprint arXiv:2107.07235, 2021.
[32] J. Li, S. Ma, J. Zhang, and D. Tao, “Privacy-preserving portrait matting,” in Proceedings of the 29th ACM International Conference on Multimedia, pp. 3501–3509, 2021.
[33] J. Li, J. Zhang, S. J. Maybank, and D. Tao, “Bridging composite and real: towards end-to-end deep image matting,” International Journal of Computer Vision, vol. 130, no. 2, pp. 246–266, 2022.
[34] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755, Springer, 2014.
[35] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.” http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
[36] Q. Liu, S. Zhang, Q. Meng, R. Li, B. Zhong, and L. Nie, “Rethinking context aggregation in natural image matting,” arXiv preprint arXiv:2304.01171, 2023.
[37] C. Rhemann, C. Rother, J. Wang, M. Gelautz, P. Kohli, and P. Rott, “A perceptually motivated online benchmark for image matting,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1826–1833, IEEE, 2009.
[38] E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, “Randaugment: Practical automated data augmentation with a reduced search space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 702–703, 2020.

QR CODE