簡易檢索 / 詳目顯示

研究生: 黃光呈
Kuang-Cheng Huang
論文名稱: 單一人物之影像去背
Independent Human Image Matting
指導教授: 王乃堅
Nai-Jian Wang
口試委員: 蘇順豐
Shun-Feng Su
郭景明
Jing-Ming Guo
鍾順平
Shun-Ping Chung
呂學坤
Shyue-Kung Lu
王乃堅
Nai-Jian Wang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 53
中文關鍵詞: 影像去背物件追蹤影像分割物件連通
外文關鍵詞: Image matting, Object tracking, Image segmentation, Fast connectedcomponent labeling
相關次數: 點閱:203下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 影像去背(Image Matting)是許多影像應用的基礎,如電影合成、遠距會議…等。影像去背可以分成三種去背問題,單色去背、差異去背、自然影像去背。
    單色去背的影像背景為已知的單一顏色,通常用於拍攝電影時,藉由綠幕的方式替換背景,為最簡單的影像去背。差異去背需要拍攝完已知背景後,再拍攝加入人物後的影像,此方法在邊界上需要加強處理。自然影像去背的背景沒有限制,同時也是最困難的去背問題,此問題主要是希望能找出影像中欲保留的前景,因為一張影像可能同時包含了前景、背景、非預期的前景,但只要保留指定的前景,因此需要找到一張前景遮照,藉由此遮罩影像來獲得欲保留的前景。早期的方法需要藉由人工手動標記前景與背景,這樣的方法非常消耗時間。
    近年的方法也在輸入影像有著許多條件限制,在應用上並不方便。本篇論文結合許多影像演算法和神經網路的方式,僅需給予一張照片或影片,並且在一開始手動框選欲保留的人物,將輸入影像經過一系列運算後得到該遮罩影像,藉由該遮罩影像與原始影像結合得到前景。
    在本篇論文先藉 DeepLabV3 [1]得到人的粗糙前景遮罩,該影像在邊界處理是非常粗糙的,因此透過新訓練的模型來改善邊界的粗糙問題。DeepLabV3 [1]會同時找出所有人的粗糙前景遮罩,為了抓取單一人物的粗糙前景遮罩,採取物件追蹤[2]、物件連通[3]與影像分割[4]的技術,獲得單一人物的粗糙前景遮罩。
    實驗結果發現,由本篇論文所提出的方法克服了其他影像去背方法所遇到的問題,不會因為動態背景或是聚焦問題而影響結果,也不需要做手動標記前景背景,僅需在一開始框選欲追蹤的人物。由於問題複雜度上升,整體計算量也隨之上升,但克服了許多方法的限制條件。


    Image Matting is the basis of many image applications, such as film compositing,
    teleconferencing. There are three types of problems in image matting: difference
    matting, natural image matting, and constant-color matting. Earlier methods required
    manual labeling of foreground and background, which was time-consuming. The
    methods in recent years also have many constraints on the input image, which are
    inconvenient in application. This thesis combines many image algorithms and neural
    network methods to solve natural image matting. It only needs to give a photo or video,
    and manually select the characters to be retained at the beginning. After a series of the operations, the mask is obtained. We can combine the mask and original image to obtain
    the foreground.
    In our method, DeepLabV3 [1] creates the rough foreground mask which is very
    rough at the boundary. We train a new model to improve the rough boundary.
    DeepLabV3 [1] will find all people in mask. In order to capture specified person’s mask, we use object tracking [2], connected-component labeling [3] and image segmentation
    [4] to obtain the specified one person.
    Our experimental results show that our method overcomes the problems from
    other image matting methods, and does not affect the results due to dynamic
    background or focus problems, and does not require manual marking of the foreground
    and background. It only needs frame the person you want to track. As the complexity
    of the problem increases, such as rough edges, unexpected person, and overlapping
    person. The overall computational load increase, but it overcomes the limitations of
    many methods.

    摘要................................................................................................................................ I Abstract.........................................................................................................................II 致謝..............................................................................................................................III 目錄..............................................................................................................................IV 圖目錄..........................................................................................................................VI 表目錄.......................................................................................................................VIII 第一章 緒論..................................................................................................................1 1.1 研究背景與動機.............................................................................................1 1.2 文獻回顧.........................................................................................................2 1.3 論文目標.........................................................................................................3 1.4 論文組織.........................................................................................................4 第二章 系統架構..........................................................................................................5 2.1 影像去背方法.................................................................................................5 2.2 系統架構.........................................................................................................5 2.3 系統架構之問題.............................................................................................6 2.4 改善的系統架構.............................................................................................7 第三章 人物去背與網路架構......................................................................................8 3.1 訓練資料.........................................................................................................8 3.2 資料強化.........................................................................................................9 3.3 網路架構.......................................................................................................10 3.4 優化器與損失函數.......................................................................................12 3.5 遮罩處理前後差異.......................................................................................14 第四章 單一人物與重疊處理....................................................................................15 4.1 物件追蹤.......................................................................................................15 4.2 快速物件連通...............................................................................................19 4.3 影像分割.......................................................................................................21 4.4 人物重疊處理...............................................................................................24 第五章 實驗結果與分析............................................................................................28 5.1 實驗環境規格...............................................................................................28 5.2 影像品質評估標準.......................................................................................28 5.3 測試資料集...................................................................................................30 V 5.4 實驗結果.......................................................................................................31 5.4.1 實驗參數設置....................................................................................31 5.4.2 影像去背與其他方法之比較............................................................33 5.4.3 分割單一人物之比較........................................................................35 5.4.4 執行時間比較....................................................................................37 第六章 結論與未來研究方向....................................................................................39 6.1 結論...............................................................................................................39 6.2 未來研究方向...............................................................................................40 參考文獻......................................................................................................................4

    [1] Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous
    convolution for semantic image segmentation. (arXiv preprint arXiv: 1706.05587).
    [2] Bolme, D. S., Beveridge, J. R., Draper, B. A., & Lui, Y. M. (2010). Visual object
    tracking using adaptive correlation filters. IEEE Computer Society Conference on
    Computer Vision and Pattern Recognition (pp. 2544-2550).
    [3] He, L., Chao, Y., Suzuki, K., & Wu, K. (2009). Fast connected-component labeling.
    Pattern Recognition, 42(9), (pp. 1977-1987).
    [4] Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012).
    SLIC Superpixels Compared to state-of-the-art superpixel methods. IEEE
    Transactions on Pattern Analysis and Machine Intelligence, (pp. 2274-2282).
    [5] Chuang, Y. Y., Curless, B., Salesin, D. H., & Szeliski, R. (2001). A bayesian
    approach to digital matting. IEEE Computer Society Conference on Computer
    Vision and Pattern Recognition. CVPR 2001. (Vol. 2, pp. II-264).
    [6] Sun, J., Jia, J., Tang, C. K., & Shum, H. Y. (2004). Poisson matting. ACM
    SIGGRAPH (pp. 315-321).
    [7] Zheng, Y., & Kambhamettu, C. (2009). Learning based digital matting. IEEE 12th
    International Conference on Computer Vision (pp. 889-896).
    [8] Singh, S., Jalal, A. S., & Bhatanagar, C. (2013). Automatic trimap and alpha-matte
    generation for digital image matting. Sixth International Conference on
    Contemporary Computing (IC3) (pp. 202-208).
    [9] Canny, J. (1986). A computational approach to edge detection. IEEE Transactions
    on Pattern Analysis and Machine Intelligence (pp. 679-698).
    [10] Xu, N., Price, B., Cohen, S., & Huang, T. (2017). Deep image matting. IEEE
    Conference on Computer Vision and Pattern Recognition (pp. 2970-2979).
    [11] Sengupta, S., Jayaram, V., Curless, B., Seitz, S. M., & Kemelmacher-Shlizerman,
    I. (2020). Background matting: The world is your green screen. IEEE/CVF
    Conference on Computer Vision and Pattern Recognition (pp. 2291-2300).
    [12] Lin, S., Ryabtsev, A., Sengupta, S., Curless, B. L., Seitz, S. M., & KemelmacherShlizerman, I. (2021). Real-time high-resolution background matting. IEEE/CVF
    Conference on Computer Vision and Pattern Recognition (pp. 8762-8771).
    [13] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image
    recognition. IEEE Conference on Computer Vision and Pattern Recognition (pp.
    770-778).
    [14] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017).
    Deeplab: Semantic image segmentation with deep convolutional nets, atrous
    convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and
    Machine Intelligence, 40(4), (pp. 834-848).
    [15] Li, J., Zhang, J., Maybank, S. J., & Tao, D. (2020). End-to-end animal image
    matting. (arXiv preprint arXiv:2010.16188).
    [16] Chang, H. T., & Su, J. T. (2016). Segmentation using Codebook Index Statistics
    for Vector Quantized Images. International Journal of Advanced Computer
    Science and Applications , 7(12), (pp. 59-65).
    [17] Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014).
    Microsoft coco: Common objects in context. In 2014 European Conference on
    Computer Vision (pp. 740-755).

    無法下載圖示 全文公開日期 2024/06/27 (校內網路)
    全文公開日期 2024/06/27 (校外網路)
    全文公開日期 2024/06/27 (國家圖書館:臺灣博碩士論文系統)
    QR CODE