簡易檢索 / 詳目顯示

研究生: 吳伊恩
Ian Benedict M. Ona
論文名稱: SImP-Net: Single-Image Parts Segmentation by Disentangling Shape and Appearance
SImP-Net: Single-Image Parts Segmentation by Disentangling Shape and Appearance
指導教授: 花凱龍
Kai-Lung Hua
口試委員: 陳永耀
Yung-Yao Chen
許聿靈
Yu-Ling Hsu
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 42
外文關鍵詞: parts representation, parts segmentation
相關次數: 點閱:171下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  • Unsupervised part segmentation aims to label each pixel in an image as belonging to a part of an object. Prior works are able to learn object parts by leveraging on the changes in geometry and appearance present in either multi-view image collections or videos. Works that use only a single image as input are only able to segment parts with color similarities which limits the quality of the retrieved object parts. To capture part features beyond color, other works utilise a pre-trained model which does not generalize well to classes outside of the ImageNet dataset e.g. industrial products. To address this problem, we propose a novel segmentation network that learns parts by reconstructing the input with only a set number of part clusters and an appearance vector per part, effectively learning a disentangled part-appearance representation. This bottleneck encourages parts to be grouped by a common appearance vector, effectively encoding both color and texture. We use a mutual information loss to cluster pixels of similar appearance and a spatial continuity loss to group pixels that form local connections. To further constrain clusters to contain relevant parts, we propose the use of a novel reassignment loss which penalizes each cluster into having at most one unique object part. We demonstrate competitive performance on the BSD500 dataset and also show that the disentangled shape and appearance representation can be used in other applications such as image editing.

    Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . i Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . iv Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Network Architecture . . . . . . . . . . . . . . . . . . . . 7 3.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3.1 Mutual Information Loss . . . . . . . . . . . . . . 10 3.3.2 Spatial Continuity Loss . . . . . . . . . . . . . . . 11 3.3.3 Reassignment Loss . . . . . . . . . . . . . . . . . 12 v3.3.4 Reconstruction Loss . . . . . . . . . . . . . . . . 13 4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.1 Implementation Details . . . . . . . . . . . . . . . . . . . 14 4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2.1 Quantitative Evaluation . . . . . . . . . . . . . . 15 4.2.2 Qualitative Analysis . . . . . . . . . . . . . . . . 17 4.2.3 Ablation . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.4 Image Editing . . . . . . . . . . . . . . . . . . . . 29 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    [1] T. Fang, Z. Liang, X. Shao, Z. Dong, and J. Li, Self-supervised Multi-view Clustering for Unsupervised
    Image Segmentation, pp. 113–125. 09 2021.
    [2] Q. Gao, B. Wang, L. Liu, and B. Chen, “Unsupervised co-part segmentation through assembly,” arXiv
    preprint arXiv:2106.05897, 2021.
    [3] D. Lorenz, L. Bereska, T. Milbich, and B. Ommer, “Unsupervised part-based disentangling of object
    shape and appearance,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
    Recognition, pp. 10955–10964, 2019.
    [4] S. Liu, L. Zhang, X. Yang, H. Su, and J. Zhu, “Unsupervised part segmentation through disentangling
    appearance and shape,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
    Recognition, pp. 8355–8364, 2021.
    [5] O. Sbai, C. Couprie, and M. Aubry, “Unsupervised image decomposition in vector layers,” in 2020
    IEEE International Conference on Image Processing (ICIP), pp. 1576–1580, IEEE, 2020.
    [6] W. Kim, A. Kanezaki, and M. Tanaka, “Unsupervised learning of image segmentation based on differ-
    entiable feature clustering,” IEEE Transactions on Image Processing, vol. 29, pp. 8055–8068, 2020.
    [7] W.-C. Hung, V. Jampani, S. Liu, P. Molchanov, M.-H. Yang, and J. Kautz, “Scops: Self-supervised
    co-part segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
    Recognition, pp. 869–878, 2019.
    [8] T. D. Nguyen, T. Tran, D. Phung, and S. Venkatesh, “Learning parts-based representations with non-
    negative restricted boltzmann machine,” in Proceedings of the 5th Asian Conference on Machine
    Learning (C. S. Ong and T. B. Ho, eds.), vol. 29 of Proceedings of Machine Learning Research,
    (Australian National University, Canberra, Australia), pp. 133–148, PMLR, 13–15 Nov 2013.
    [9] D. A. Ross and R. S. Zemel, “Learning parts-based representations of data,” J. Mach. Learn. Res.,
    vol. 7, pp. 2369–2397, 2006.
    [10] J. Wang and A. L. Yuille, “Semantic part segmentation using compositional model combining shape
    and appearance,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR), June 2015.
    [11] S. Eslami and C. Williams, “A generative model for parts-based object segmentation,” in Advances in
    Neural Information Processing Systems (F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger,
    eds.), vol. 25, Curran Associates, Inc., 2012.
    [12] A. Siarohin, S. Roy, S. Lathuilière, S. Tulyakov, E. Ricci, and N. Sebe, “Motion-supervised co-part
    segmentation,” 2020.
    [13] A. Joulin, F. Bach, and J. Ponce, “Multi-class cosegmentation,” in 2012 IEEE Conference on Com-
    puter Vision and Pattern Recognition, pp. 542–549, 2012.
    [14] M. Rubinstein, A. Joulin, J. Kopf, and C. Liu, “Unsupervised joint object discovery and segmentation
    in internet images,” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2013.
    [15] J. Krause, H. Jin, J. Yang, and L. Fei-Fei, “Fine-grained recognition without part annotations,” in 2015
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5546–5555, 2015.
    [16] H. Chang and Y. Demiris, “Highly articulated kinematic structure estimation combining motion and
    skeleton information,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40,
    pp. 2165–2179, Sept. 2018.
    [17] J. Yan and M. Pollefeys, “A factorization-based approach for articulated nonrigid shape, motion and
    kinematic chain recovery from video,” IEEE transactions on pattern analysis and machine intelli-
    gence, vol. 30, pp. 865–77, 06 2008.
    [18] Z. Xu, Z. Liu, C. Sun, K. Murphy, W. T. Freeman, J. B. Tenenbaum, and J. Wu, “Unsupervised
    discovery of parts, structure, and dynamics,” 2019.
    [19] M. Chen, T. Artières, and L. Denoyer, “Unsupervised object segmentation by redrawing,” arXiv
    preprint arXiv:1905.13539, 2019.
    [20] W. Van Gansbeke, S. Vandenhende, S. Georgoulis, and L. Van Gool, “Unsupervised semantic seg-
    mentation by contrasting object mask proposals,” arXiv preprint arXiv:2102.06191, 2021.
    [21] Y. Zhang, H. Ling, J. Gao, K. Yin, J.-F. Lafleche, A. Barriuso, A. Torralba, and S. Fidler, “Dataset-
    gan: Efficient labeled data factory with minimal human effort,” in Proceedings of the IEEE/CVF
    Conference on Computer Vision and Pattern Recognition, pp. 10145–10155, 2021.
    [22] X. Ji, J. F. Henriques, and A. Vedaldi, “Invariant information clustering for unsupervised image clas-
    sification and segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer
    Vision, pp. 9865–9874, 2019.
    [23] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-
    resolution,” 2016.
    [24] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based image segmentation,” International
    Journal of Computer Vision, vol. 59, pp. 167–181, 2004.

    QR CODE