簡易檢索 / 詳目顯示

研究生: Tadhg McCarthy
Tadhg McCarthy
論文名稱: SegNet: Utilising Segmentation Masks forClass-Agnostic Counting
SegNet: Utilising Segmentation Masks forClass-Agnostic Counting
指導教授: 花凱龍
Kai-Lung Hua
口試委員: 陳永耀
Yung-Yao Chen
許聿靈
Yu-Ling Hsu
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 39
中文關鍵詞: class-agnostic countingextreme pointssegmentation masks
外文關鍵詞: class-agnostic counting, extreme points, segmentation masks
相關次數: 點閱:149下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

Class agnostic counting involves tallying the instances of any user-defined class. It is also usually phrased as a matching problem wherein the model finds all the matching objects in a query image given exemplar patches containing the target object. Typically, users define exemplar patches by placing bounding boxes around the target object. However, defining exemplars using bounding boxes inevitably captures both the target object (foreground) and its surrounding background. This would unintentionally match patches similar to the background, leading to an inaccurate count. Moreover, objects poorly represented by a bounding box (e.g., non-axis aligned, irregular, or non-rectangular shapes) may capture a significantly disproportionate amount of background relative to the foreground within the exemplar patch, leading to poor matches. In this paper, we propose to utilize segmentation masks to separate target objects from their background. We derived these segmentation masks from extreme points, which requires no additional annotation effort from the user compared to annotating bounding boxes. Moreover, we designed a module that learns the mask features as residual to the object features, allowing the network to learn how to better incorporate the segmentation masks. Our model improves upon state-of-the-art methods by up to 3.7 MAE points on the FSC-147 benchmark dataset, showing the effectiveness of our masking approach.


Class agnostic counting involves tallying the instances of any user-defined class. It is also usually phrased as a matching problem wherein the model finds all the matching objects in a query image given exemplar patches containing the target object. Typically, users define exemplar patches by placing bounding boxes around the target object. However, defining exemplars using bounding boxes inevitably captures both the target object (foreground) and its surrounding background. This would unintentionally match patches similar to the background, leading to an inaccurate count. Moreover, objects poorly represented by a bounding box (e.g., non-axis aligned, irregular, or non-rectangular shapes) may capture a significantly disproportionate amount of background relative to the foreground within the exemplar patch, leading to poor matches. In this paper, we propose to utilize segmentation masks to separate target objects from their background. We derived these segmentation masks from extreme points, which requires no additional annotation effort from the user compared to annotating bounding boxes. Moreover, we designed a module that learns the mask features as residual to the object features, allowing the network to learn how to better incorporate the segmentation masks. Our model improves upon state-of-the-art methods by up to 3.7 MAE points on the FSC-147 benchmark dataset, showing the effectiveness of our masking approach.

Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . i Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . iv Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Feature Extractor . . . . . . . . . . . . . . . . . . . . . . 8 3.3 Residual Module . . . . . . . . . . . . . . . . . . . . . . 10 3.4 Counting Module . . . . . . . . . . . . . . . . . . . . . . 10 3.5 Multi-scale exemplars . . . . . . . . . . . . . . . . . . . . 12 3.6 Test-Time Adaptation . . . . . . . . . . . . . . . . . . . . 12 4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.1 Implementation Details . . . . . . . . . . . . . . . . . . . 14 4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . 14 4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.4 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . 23 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

[1] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[2] V. Ranjan, H. Le, and M. Hoai, “Iterative crowd counting,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 270–285, 2018.
[3] X. Cao, Z. Wang, Y. Zhao, and F. Su, “Scale aggregation network for accurate and efficient crowd counting,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750, 2018.
[4] C. Arteta, V. Lempitsky, and A. Zisserman, “Counting in the wild,” in European conference on computer vision, pp. 483–498, Springer, 2016.
[5] Z. Ma, X. Wei, X. Hong, and Y. Gong, “Bayesian loss for crowd count estimation with point supervision,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6142–6151, 2019.
[6] C. Arteta, V. Lempitsky, J. A. Noble, and A. Zisserman, “Detecting overlapping instances in microscopy images using extremal region trees,” Medical image analysis, vol. 27, pp. 3–16, 2016.
[7] M. Rahnemoonfar and C. Sheppard, “Deep count: fruit counting based on deep simulated learning,”Sensors, vol. 17, no. 4, p. 905, 2017.
[8] V. Ranjan, U. Sharma, T. Nguyen, and M. Hoai, “Learning to count everything,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3394–3403, 2021.
[9] E. Lu, W. Xie, and A. Zisserman, “Class-agnostic counting,” in Asian conference on computer vision, pp. 669–684, Springer, 2018.
[10] X. Zhou, J. Zhuo, and P. Krahenbuhl, “Bottom-up object detection by grouping extreme and center points,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 850–859, 2019.
[11] K.-K. Maninis, S. Caelles, J. Pont-Tuset, and L. Van Gool, “Deep extreme cut: From extreme points to object segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 616–625, 2018.
[12] S.-D. Yang, H.-T. Su, W. H. Hsu, and W.-C. Chen, “Class-agnostic few-shot object counting,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 870–878, 2021.
[13] X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,”in Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510, 2017. 28
[14] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang, “Universal style transfer via feature transforms,” arXiv preprint arXiv:1705.08086, 2017.
[15] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346, 2019.
[16] G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro, “Image inpainting for irregular holes using partial convolutions,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 85–100, 2018.
[17] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Free-form image inpainting with gated convolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4471–4480, 2019.
[18] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester, “Image inpainting,” in Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pp. 417–424, 2000.
[19] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Generative image inpainting with contextual attention,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[20] M. Richard and M. Y.-S. Chang, “Fast digital image inpainting,” in Appeared in the Proceedings of the International Conference on Visualization, Imaging and Image Processing (VIIP 2001), Marbella, Spain, pp. 106–107, 2001.
[21] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learning by inpainting,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2536–2544, 2016.
[22] B. Kang, Z. Liu, X. Wang, F. Yu, J. Feng, and T. Darrell, “Few-shot object detection via feature reweighting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8420–8429, 2019.
[23] Q. Fan, W. Zhuo, C.-K. Tang, and Y.-W. Tai, “Few-shot object detection with attention-rpn and multirelation detector,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4013–4022, 2020.
[24] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in International Conference on Machine Learning, pp. 1126–1135, PMLR, 2017.

QR CODE