簡易檢索 / 詳目顯示

研究生: 李博宬
Bo-Cheng Li
論文名稱: 結合漸進式領域自適應與圖像聚類之物件偵測法
Incremental Domain Adaptation with Image Clustering for Object Detection
指導教授: 方文賢
Wen-Hsien Fang
陳郁堂
Yie-Tarng Chen
口試委員: 方文賢
Wen-Hsien Fang
陳郁堂
Yie-Tarng Chen
賴坤財
Kuen-Tsair Lay
丘建青
Chien-Ching Chiu
呂政修
Jenq-Shiou Leu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 64
中文關鍵詞: 物件偵測領域自適應卷積神經網路鑑別器圖片風格轉換
外文關鍵詞: Object detection, Domain adaptation, CNN, Discriminator, Image style-transfer
相關次數: 點閱:202下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在傳統未加入領域自適應的監督式學習中,需要仰賴大量的標記數據集來進行訓練,但是標記圖像必須花費許多的人力和時間,且監督學習模型在測試來自不同分佈的圖像時效果也並不理想。為了實現跨越各種場景的物件偵測,模型需要有能力適應訓練和測試數據集之間的領域變化。領域自適應是一種透過學習含有標記的數據集,再將其知識適應到未標記的目標數據集上的方法。然而,領域間的域差讓這個任務變得非常有挑戰性。我們在本論文一種新方法來減輕域差所帶來的影響,其中包含了圖像聚類和漸進式領域自適應。首先,我們使用圖像聚類將源域圖像分配到具有高圖像相似度的集合,從而確保我們輸入到隨後領域自適應流程中的標記圖像之間沒有太大的域差。接下來,我們會實施漸式領域自適應法,其包含兩個步驟。第一步是從源數據和目標數據生成一組風格轉換的圖像,並將這些圖像視為傳遞域。第二步是執行兩階段的領域自適應,先從源域適應到傳遞域,再從傳遞域適應到目標域。如此一來,領域自適應網絡的訓練就可以避免直接橫跨源域和目標域之間顯著的域差。實驗結果顯示我們提出的新方法在虛擬到現實、不同相機角度、不同天氣和城市場景的數據集上都有令人滿意的表現。


The traditional supervised learning without domain adaptation relies on a huge number of bounding box annotations. Making annotations for images cost a lot of manpower and time. Also, supervised learning models do not work well on testing images which are from different distribution. To achieve accurate object detection across a variety of scenarios, the model needs to be adaptable to the domain shifts between the training and test data. Domain adaptation is a method that learns to adapt from the labeled source data to the unlabeled target testing data. However, the domain gap in general makes the task challenging. In this thesis, we consider a new approach which consists of two parts, image clustering and incremental adaptation, to mitigate the impact of domain gap. First, we use a density-based image clustering scheme to distribute the source domain images into a collection of groups with high visual similarity. Thereby, there is no excessive domain gap between the input labeled images in the succeeding domain adaptation process. Afterwards, we consider a new incremental adaptation scheme which consists of two step. The first step is to generate a set of style-transferred images from the source and target data, which are considered as the transitive domain. The second step is to use a two-stage domain adaptation to conduct domain adaptation from the source domain to transitive domain, followed by domain adaptation from the transitive domain to target domain. As the result, the adaptation network can avoid training across a significant domain gap between the source and target domains. Experimental results show that the new approach can produce satisfactory performance in the virtual to real, cross camera and different weather or city scenes.

摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Domain Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Domain Adaptation for Object Detection . . . . . . . . . . . . . . 4 2.4 Density-based Image Clustering . . . . . . . . . . . . . . . . . . . 5 2.5 Image Style Transfer . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1 Proposed Architecture . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Density-based Image Clustering . . . . . . . . . . . . . . . . . . . 8 3.3 Incremental Domain Adaptation . . . . . . . . . . . . . . . . . . . 9 3.3.1 Transitive Domain . . . . . . . . . . . . . . . . . . . . . . 10 3.3.2 Adaptation Process . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4.1 Detection Network Loss . . . . . . . . . . . . . . . . . . . 16 3.4.2 Domain Discriminator Loss . . . . . . . . . . . . . . . . . 16 3.4.3 Loss Function for Overall Adversarial Training . . . . . . 17 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4 Experimental Results and Discussion . . . . . . . . . . . . . . . . . . . 18 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1.1 Cityscapes Dataset . . . . . . . . . . . . . . . . . . . . . . 18 4.1.2 Cityscapes Foggy Dataset . . . . . . . . . . . . . . . . . . 19 4.1.3 Sim10k Dataset . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1.4 Kitti Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1.5 BDD100k Dataset . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.4.1 Weather Adaptation . . . . . . . . . . . . . . . . . . . . . 24 4.4.2 Virtual to Real Adaptation . . . . . . . . . . . . . . . . . 27 4.4.3 Scene Adaptation . . . . . . . . . . . . . . . . . . . . . . . 30 4.4.4 Cross Camera Adaptation . . . . . . . . . . . . . . . . . . 34 4.5 Failure Cases and Error Analysis . . . . . . . . . . . . . . . . . . 36 4.5.1 Similar Appearance of Different Class Objects . . . . . . . 37 4.5.2 Identification Accuracy of Major Classes . . . . . . . . . . 39 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5 Conclusion and Future Works . . . . . . . . . . . . . . . . . . . . . . . 42 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

[1] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448, 2015.
[2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788, 2016.
[3] L. McInnes, J. Healy, and S. Astels, “hdbscan: Hierarchical density based clustering.,” J. Open Source Softw., vol. 2, no. 11, p. 205, 2017.
[4] C. Chen, W. Xie, W. Huang, Y. Rong, X. Ding, Y. Huang, T. Xu, and J. Huang, “Progressive feature alignment for unsupervised domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 627–636, 2019.
[5] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627–1645, 2010.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
[7] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361, 2012.
[8] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223, 2016.
[9] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in Neural Information Processing Systems, vol. 28, 2015.
[10] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan, “Unsupervised pixel-level domain adaptation with generative adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3722–3731, 2017.
[11] J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. Efros, and T. Darrell, “Cycada: Cycle-consistent adversarial domain adaptation,” in International Conference on Machine Learning, pp. 1989–1998, 2018.
[12] S.-W. Huang, C.-T. Lin, S.-P. Chen, Y.-Y. Wu, P.-H. Hsu, and S.-H. Lai, “Auggan: Cross domain adaptation with gan-based data augmentation,” in European Conference on Computer Vision, pp. 718–731, 2018.
[13] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232, 2017.
[14] K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan, “Domain separation networks,” Advances in Neural Information Processing Systems, vol.29, 2016.
[15] Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, “Domain adaptive faster r-cnn for object detection in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3339–3348, 2018.
[16] K. Saito, Y. Ushiku, T. Harada, and K. Saenko, “Strong-weak distribution alignment for adaptive object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6956–6965, 2019.
[17] Z. He and L. Zhang, “Multi-adversarial faster-rcnn for unrestricted object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6668–6677, 2019.
[18] X. Zhu, J. Pang, C. Yang, J. Shi, and D. Lin, “Adapting object detectors via selective cross-domain alignment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 687–696, 2019.
[19] Q. Cai, Y. Pan, C.-W. Ngo, X. Tian, L. Duan, and T. Yao, “Exploring object relation in mean teacher for cross-domain detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11457–11466, 2019.
[20] T. Kim, M. Jeong, S. Kim, S. Choi, and C. Kim, “Diversify and match: A domain adaptive representation learning paradigm for object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12456–12465, 2019.
[21] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., “A density-based algorithm for discovering clusters in large spatial databases with noise.,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, vol. 96, pp. 226–231, 1996.
[22] A. Rodriguez and A. Laio, “Clustering by fast search and find of density peaks,” Science, vol. 344, no. 6191, pp. 1492–1496, 2014.
[23] Y. Zhu, K. M. Ting, and M. J. Carman, “Density-ratio based clustering for discovering clusters with varying densities,” Pattern Recognition, vol. 60, pp. 983–997, 2016.
[24] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “Optics: Ordering points to identify the clustering structure,” ACM SIGMOD Record, vol. 28, no. 2, pp. 49–60, 1999.
[25] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., “Scikit-learn: Machine learning in python,” The Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[26] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb “Learning from simulated and unsupervised images through adversarial training,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2107–2116, 2017.
[27] Y. Taigman, A. Polyak, and L. Wolf, “Unsupervised cross-domain image generation,” arXiv preprint arXiv:1611.02200, 2016.
[28] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423, 2016.
[29] Y. Yang and S. Soatto, “Fda: Fourier domain adaptation for semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4085–4095, 2020.
[30] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.,” Journal of Machine Learning Research, vol. 9, no. 11, 2008.
[31] M. Johnson-Roberson, C. Barto, R. Mehta, S. N. Sridhar, K. Rosaen, and R. Vasudevan, “Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?,” arXiv preprint arXiv:1610.01983, 2016.
[32] C. Sakaridis, D. Dai, and L. Van Gool, “Semantic foggy scene understanding with synthetic data,” International Journal of Computer Vision, vol. 126, no. 9, pp. 973–992, 2018.
[33] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in International Conference on Machine Learning, pp. 1180–1189, 2015.
[34] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell, “Bdd100k: A diverse driving video database with scalable annotation tooling,” arXiv preprint arXiv:1805.04687, vol. 2, no. 5, p. 6, 2018.
[35] S. Wu, S. Zhong, and Y. Liu, “Deep residual learning for image steganalysis,”
Multimedia Tools and Applications, vol. 77, no. 9, pp. 10437–10453, 2018.
[36] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[37] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009.
[38] H. Robbins and S. Monro, “A stochastic approximation method,” The Annals of Mathematical Statistics, pp. 400–407, 1951.
[39] M. He, Y. Wang, J. Wu, Y. Wang, H. Li, B. Li, W. Gan, W. Wu, and Y. Qiao, “Cross domain object detection by target-perceived dual branch distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9570–9580, 2022.
[40] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 2117–2125, 2017.
[41] H. Wang, S. Liao, and L. Shao, “Afan: Augmented feature alignment network for cross-domain object detection,” IEEE Transactions on Image Processing, vol. 30, pp. 4046–4056, 2021.
[42] M. Xu, H. Wang, B. Ni, Q. Tian, and W. Zhang, “Cross-domain detection via graph-induced prototype alignment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12355–12364, 2020.
[43] F. Rezaeianaran, R. Shetty, R. Aljundi, D. O. Reino, S. Zhang, and B. Schiele, “Seeking similarities over differences: Similarity-based domain alignment for adaptive object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9204–9213, 2021.
[44] W. Wang, Y. Cao, J. Zhang, F. He, Z.-J. Zha, Y. Wen, and D. Tao, “Exploring sequence feature alignment for domain adaptive detection transformers”, in Proceedings of the 29th ACM International Conference on Multimedia, pp. 1730–1738, 2021.
[45] B. Csaba, X. Qi, A. Chaudhry, P. Dokania, and P. Torr, “Multilevel knowledge transfer for cross-domain object detection,” arXiv preprint arXiv:2108.00977, 2021.
[46] Y. Wang, R. Zhang, S. Zhang, M. Li, Y. Xia, X. Zhang, and S. Liu, “Domain-specific suppression for adaptive object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9603–9612, 2021.
[47] C. Chen, Z. Zheng, X. Ding, Y. Huang, and Q. Dou, “Harmonizing transferability and discriminability for adapting object detectors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8869–8878, 2020.
[48] Z. He and L. Zhang, “Domain adaptive object detection via asymmetric triway faster-rcnn,” in European Conference on Computer Vision, pp. 309–324, 2020.
[49] X. Li, W. Chen, D. Xie, S. Yang, P. Yuan, S. Pu, and Y. Zhuang, “A free lunch for unsupervised domain adaptive object detection without source data,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 8474–8481, 2021.
[50] V. Vs, V. Gupta, P. Oza, V. A. Sindagi, and V. M. Patel, “Mega-cda: Memory guided attention for category-aware unsupervised domain adaptive object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4516–4526, 2021.
[51] Y. Zhang, Z. Wang, and Y. Mao, “Rpn prototype alignment for domain adaptive object detector,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12425–12434, 2021.
[52] C. Zhuang, X. Han, W. Huang, and M. Scott, “ifan: Image-instance full alignment networks for adaptive object detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13122–13129, 2020.
[53] S. Li, J. Huang, X.-S. Hua, and L. Zhang, “Category dictionary guided unsupervised domain adaptation for object detection,” in Proceedings of the
AAAI Conference on Artificial Intelligence, vol. 35, pp. 1949–1957, 2021.
[54] C.-C. Hsu, Y.-H. Tsai, Y.-Y. Lin, and M.-H. Yang, “Every pixel matters: Center-aware feature alignment for domain adaptive object detector,” in European Conference on Computer Vision, pp. 733–748, 2020.
[55] C.-D. Xu, X.-R. Zhao, X. Jin, and X.-S. Wei, “Exploring categorical regularization for domain adaptive object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11724–11733, 2020.
[56] V. Khindkar, C. Arora, V. N. Balasubramanian, A. Subramanian, R. Saluja, and C. Jawahar, “To miss-attend is to misalign! residual self-attentive feature alignment for adapting object detectors,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3632–3642, 2022.

無法下載圖示 全文公開日期 2024/08/23 (校內網路)
全文公開日期 2024/08/23 (校外網路)
全文公開日期 2024/08/23 (國家圖書館:臺灣博碩士論文系統)
QR CODE