SImP-Net: Single-Image Parts Segmentation by Disentangling Shape and Appearance

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳伊恩 Ian Benedict M. Ona
論文名稱：	SImP-Net: Single-Image Parts Segmentation by Disentangling Shape and Appearance SImP-Net: Single-Image Parts Segmentation by Disentangling Shape and Appearance
指導教授：	花凱龍 Kai-Lung Hua
口試委員:	陳永耀 Yung-Yao Chen 許聿靈 Yu-Ling Hsu
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	42
外文關鍵詞：	parts representation, parts segmentation
相關次數：	點閱：171 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Unsupervised part segmentation aims to label each pixel in an image as belonging to a part of an object. Prior works are able to learn object parts by leveraging on the changes in geometry and appearance present in either multi-view image collections or videos. Works that use only a single image as input are only able to segment parts with color similarities which limits the quality of the retrieved object parts. To capture part features beyond color, other works utilise a pre-trained model which does not generalize well to classes outside of the ImageNet dataset e.g. industrial products. To address this problem, we propose a novel segmentation network that learns parts by reconstructing the input with only a set number of part clusters and an appearance vector per part, effectively learning a disentangled part-appearance representation. This bottleneck encourages parts to be grouped by a common appearance vector, effectively encoding both color and texture. We use a mutual information loss to cluster pixels of similar appearance and a spatial continuity loss to group pixels that form local connections. To further constrain clusters to contain relevant parts, we propose the use of a novel reassignment loss which penalizes each cluster into having at most one unique object part. We demonstrate competitive performance on the BSD500 dataset and also show that the disentangled shape and appearance representation can be used in other applications such as image editing.

Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . i
Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Network Architecture . . . . . . . . . . . . . . . . . . . . 7
3.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.1 Mutual Information Loss . . . . . . . . . . . . . . 10
3.3.2 Spatial Continuity Loss . . . . . . . . . . . . . . . 11
3.3.3 Reassignment Loss . . . . . . . . . . . . . . . . . 12
v3.3.4
Reconstruction Loss . . . . . . . . . . . . . . . . 13
4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1 Implementation Details . . . . . . . . . . . . . . . . . . . 14
4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.1 Quantitative Evaluation . . . . . . . . . . . . . . 15
4.2.2 Qualitative Analysis . . . . . . . . . . . . . . . . 17
4.2.3 Ablation . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.4 Image Editing . . . . . . . . . . . . . . . . . . . . 29
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
                                

[1] T. Fang, Z. Liang, X. Shao, Z. Dong, and J. Li, Self-supervised Multi-view Clustering for Unsupervised
Image Segmentation, pp. 113–125. 09 2021.
[2] Q. Gao, B. Wang, L. Liu, and B. Chen, “Unsupervised co-part segmentation through assembly,” arXiv
preprint arXiv:2106.05897, 2021.
[3] D. Lorenz, L. Bereska, T. Milbich, and B. Ommer, “Unsupervised part-based disentangling of object
shape and appearance,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 10955–10964, 2019.
[4] S. Liu, L. Zhang, X. Yang, H. Su, and J. Zhu, “Unsupervised part segmentation through disentangling
appearance and shape,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 8355–8364, 2021.
[5] O. Sbai, C. Couprie, and M. Aubry, “Unsupervised image decomposition in vector layers,” in 2020
IEEE International Conference on Image Processing (ICIP), pp. 1576–1580, IEEE, 2020.
[6] W. Kim, A. Kanezaki, and M. Tanaka, “Unsupervised learning of image segmentation based on differ-
entiable feature clustering,” IEEE Transactions on Image Processing, vol. 29, pp. 8055–8068, 2020.
[7] W.-C. Hung, V. Jampani, S. Liu, P. Molchanov, M.-H. Yang, and J. Kautz, “Scops: Self-supervised
co-part segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 869–878, 2019.
[8] T. D. Nguyen, T. Tran, D. Phung, and S. Venkatesh, “Learning parts-based representations with non-
negative restricted boltzmann machine,” in Proceedings of the 5th Asian Conference on Machine
Learning (C. S. Ong and T. B. Ho, eds.), vol. 29 of Proceedings of Machine Learning Research,
(Australian National University, Canberra, Australia), pp. 133–148, PMLR, 13–15 Nov 2013.
[9] D. A. Ross and R. S. Zemel, “Learning parts-based representations of data,” J. Mach. Learn. Res.,
vol. 7, pp. 2369–2397, 2006.
[10] J. Wang and A. L. Yuille, “Semantic part segmentation using compositional model combining shape
and appearance,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), June 2015.
[11] S. Eslami and C. Williams, “A generative model for parts-based object segmentation,” in Advances in
Neural Information Processing Systems (F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger,
eds.), vol. 25, Curran Associates, Inc., 2012.
[12] A. Siarohin, S. Roy, S. Lathuilière, S. Tulyakov, E. Ricci, and N. Sebe, “Motion-supervised co-part
segmentation,” 2020.
[13] A. Joulin, F. Bach, and J. Ponce, “Multi-class cosegmentation,” in 2012 IEEE Conference on Com-
puter Vision and Pattern Recognition, pp. 542–549, 2012.
[14] M. Rubinstein, A. Joulin, J. Kopf, and C. Liu, “Unsupervised joint object discovery and segmentation
in internet images,” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2013.
[15] J. Krause, H. Jin, J. Yang, and L. Fei-Fei, “Fine-grained recognition without part annotations,” in 2015
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5546–5555, 2015.
[16] H. Chang and Y. Demiris, “Highly articulated kinematic structure estimation combining motion and
skeleton information,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40,
pp. 2165–2179, Sept. 2018.
[17] J. Yan and M. Pollefeys, “A factorization-based approach for articulated nonrigid shape, motion and
kinematic chain recovery from video,” IEEE transactions on pattern analysis and machine intelli-
gence, vol. 30, pp. 865–77, 06 2008.
[18] Z. Xu, Z. Liu, C. Sun, K. Murphy, W. T. Freeman, J. B. Tenenbaum, and J. Wu, “Unsupervised
discovery of parts, structure, and dynamics,” 2019.
[19] M. Chen, T. Artières, and L. Denoyer, “Unsupervised object segmentation by redrawing,” arXiv
preprint arXiv:1905.13539, 2019.
[20] W. Van Gansbeke, S. Vandenhende, S. Georgoulis, and L. Van Gool, “Unsupervised semantic seg-
mentation by contrasting object mask proposals,” arXiv preprint arXiv:2102.06191, 2021.
[21] Y. Zhang, H. Ling, J. Gao, K. Yin, J.-F. Lafleche, A. Barriuso, A. Torralba, and S. Fidler, “Dataset-
gan: Efficient labeled data factory with minimal human effort,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pp. 10145–10155, 2021.
[22] X. Ji, J. F. Henriques, and A. Vedaldi, “Invariant information clustering for unsupervised image clas-
sification and segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer
Vision, pp. 9865–9874, 2019.
[23] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-
resolution,” 2016.
[24] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based image segmentation,” International
Journal of Computer Vision, vol. 59, pp. 167–181, 2004.

簡易檢索 / 詳目顯示

相關論文