基於谷歌街景多階段特徵檢測的影像定位方法｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	鄭伃婷 Yu-Ting Cheng
論文名稱：	基於谷歌街景多階段特徵檢測的影像定位方法 An Image Localization Method Based on Multi-Stage Feature Detection of Google Street View
指導教授：	孫沛立 Pei-Li Sun
口試委員:	林宗翰 Tzung-Han Lin 胡國瑞 Kuo-Jui Hu 陳鴻興 Hung-Shing Chen
學位類別：	碩士 Master
系所名稱：	應用科技學院 - 色彩與照明科技研究所 Graduate Institute of Color and Illumination Technology
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	85
中文關鍵詞：	影像定位、基於內容的影像檢索、特徵檢測、Google街景、基於深度學習的物件偵測
外文關鍵詞：	Image localization, Content-based image retrieval (CBIR), Feature detection, Google Street View, Deep learning-based object detection
相關次數：	點閱：281 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

隨著科技進步，影像處理呈現多元化的發展。在影像定位領域中，多以影像內容特徵結合GPS定位來進行相似影像的查詢。本研究利用Google街景服務，收集已知街道的影像資料集，再輸入附近的街景影像進行資料集的內容檢索，達到影像定位的目的。影像定位的流程，包含辨識影像中的超商招牌物件、分析影像中的色彩分佈與特徵點、以及對資料集的影像做近似度排序。

本研究首先建立台灣五大便利商店之招牌影像數據集，透過YOLOv4卷積神經網路架構訓練該資料集，超商招牌辨識模型，其準確率大約落在71%至77%。接著，進一步改良多尺度不變特徵(SIFT)模型，並對該模型加入樹葉遮罩，進而提高影像特徵點匹配的準確度。影像色彩特徵模型的部分，本研究是以影像主色之比例與色差位置關係來描述影像的色彩分佈特徵。

接下來，結合上述三種影像特徵模型，進行不同組合的準確度分析。在色彩特徵模型的組合，本研究強化重要的色彩特徵，進而發展出兩階段多解析度色彩特徵模型，並以62張測試影像查詢240張街景影像集，驗證三組兩階段色彩特徵模型，最佳的查詢準確率為56.5%。其中一組模型對含有建築物之影像有較高的準確率，而另外一組模型則適合自然元素為主之影像。本研究選用高斯分配貝氏分類器對這兩類影像分類，準確率相當高。對輸入影像分類之後，前者選用色彩分類與區塊色差向量作為兩階段色彩特徵模型；後者使用區塊色差向量與水平色差向量作為兩階段色彩特徵模型。最後做查詢影像與資料集影像之間的特徵點匹配，在240幅街道影像資料集中，62張測試影像的查詢準確率接近75%。

With the advancement of science and technology, image processing has shown diversified development. In the field of image localization, image features combined with GPS positioning are often used to query similar images. This research used Google Street View service to collect image data sets of a known street, and then inputed nearby street view images to retrieve the contents from the data set to achieve the goal of image localization. The process of image localization includes identifying the signboards of popular convenience stores in the image, analyzing the color distribution and feature points in the image, and sorting the images of the data set by similarity.

This research first optimized the signboard recognition model of convenience stores and established a signboard image data set of five major convenience stores in Taiwan, and trained this data set through the YOLOv4 convolutional neural network (CNN) architecture. The accuracy rate was about 71% to 77%. Afterward, the multi-scale invariant feature (SIFT) model was further improved, and a leaf mask was added to the model to improve the accuracy of feature point matching. In terms of the image color feature model, it described the color distribution characteristics of the image based on the ratio of the dominant color of the image and the spatial relationship of the color differences.

Next, combined the above three image feature models to analyze the accuracy of different combinations. In the combination of color feature models, this research strengthened important color features, and then developed a two-stage color feature model, and used 62 test images to verify three sets of two-stage multi-resolution color feature models. The best query accuracy rate was 56.5%. One set of models has a higher accuracy rate for images containing buildings, while the other set of models is suitable for images with natural elements. In this study, the Gaussian Naive Bayesian Classifier was used to classify these two types of images, and the accuracy is quite high. After classifying the input image, the former used color classification and block color difference vector as the two-stage color feature model; the latter used block color difference vector and horizontal color difference vector as the two-stage color feature model. Finally, the feature point matching between the query image and the data set image was performed. In the 240 street image data set, the query accuracy of 62 test images was close to 75%.

摘要 I
Abstract II
誌謝 IV
目錄 V
圖目錄 VIII
表目錄 X
第1章 緒論 1
1 研究背景與動機 1
2 研究目的 2
3 研究限制 2
4 論文架構 3
第2章 文獻探討 4
1 色彩空間轉換 4
1.1 RGB色彩空間 4
1.2 HSV色彩空間 5
1.3 HSL色彩空間 6
1.4 CIELAB色彩空間 7
2 基於內容的影像檢索(Content-based Image Retrieval, CBIR) 8
2.1 色彩特徵 9
2.2 紋理特徵 10
2.3 形狀特徵 11
2.4 空間資訊 11
2.5 相似性測量與索引方法 12
3 特徵點檢測方法 13
3.1 Harris角點檢測 (Harris Corner Detection) 13
3.2 SIFT (尺度不變特徵轉換，Scale-Invariant Feature Transform) 13
3.3 SURF (加速穩健特徵，Speeded-Up Robust Features) 15
3.4 ORB (定向FAST與旋轉BRIEF，Oriented FAST and Rotated BRIEF) 16
4 基於深度學習之物件檢測(Object Detection) 16
4.1 R-CNN (Region-Convolution Neural Network) 17
4.2 Fast R-CNN 18
4.3 Faster R-CNN 19
4.4 YOLO (You Only Look Once) 19
4.5 SSD (Single Shot Detector) 20
第3章 研究方法與實驗設計 21
1 研究方法 21
1.1 街景影像資料庫 21
1.2 色差向量計算法 24
1.3 特徵點匹配方法 28
1.4 基於物件偵測的招牌辨識 30
2 實驗設計 32
2.1 影像資料預處理 32
2.2 超商招牌辨識 33
2.3 特徵點檢測模型改良 34
2.4 色彩特徵檢測 35
2.5 實驗流程 36
第4章 實驗結果與分析 39
1 硬體與環境 39
2 超商招牌辨識實驗結果 39
3 特徵點檢測模型改良之實驗結果 41
3.1 低解析度下的SIFT與SURF模型 41
3.2 樹葉遮罩的影響 44
3.3 特徵點檢測模型驗證 45
4 色彩特徵檢測之實驗結果 48
4.1 不同色彩空間的影響 48
4.2 不同色彩空間的影響 49
4.3 色差向量計算方法之比較 52
5 模型調整與驗證 55
5.1 兩階段色彩特徵檢測模型組合之整理比較 55
5.2 改良色彩分類計算模型之實驗結果 56
5.3 兩階段色彩特徵模型之實驗驗證 59
5.4 多解析度之色差計算模型 60
5.5 影像分類方法 61
6 實驗結果整理與討論 62
第5章 結論與未來研究方向 64
1 結論 64
2 未來研究方向 66
參考文獻 67
附錄 71
                                

[1] Flickner, M., et al. (1995). Query by image and video content: the qbic system. Computer, 289, 23-32. doi: 10.1109/2.410146
[2] 吳瑞卿, 李文淵, 孫沛立, 徐明景, 陳鴻興, 謝翠如, & 魏碩廷. (2018). 色彩新論：從心理設計到科學應用. 五南圖書出版股份有限公司.
[3] Kimble, R. (2018, 檢視日期: 2021-06-28). Vb.net: generate color sequences using rgb color cube. from https://social.technet.microsoft.com/wiki/contents/articles/20990.vb-net-generate-color-sequences-using-rgb-color-cube.aspx
[4] MathWorks. (檢視日期: 2021-07-02). Understanding color spaces and color space conversion. from https://www.mathworks.com/help/images/understanding-color-spaces-and-color-space-conversion.html
[5] Luo, M. R., ed. (2016). Encyclopedia of color science and technology: Springer Reference.
[6] Zamir, A. R., & Shah, M. (2010). Accurate image localization based on google maps street view. European conference on computer vision, 6314, 255-268. doi: https://doi.org/10.1007/978-3-642-15561-1_19
[7] Salarian, M., Ileiv, N., & Ansari, R. (2016). Accurate image based localization by applying sfm and coordinate system registration. 2016 Ieee international symposium on multimedia (ism), 189-192. doi: 10.1109/ISM.2016.43
[8] Prasad, B. E., Gupta, A., Toong, H. M. D., & Madnick, S. E. (1987). A microconputer-based image database management system. Ieee transactions on systems, IE-34(1), 83-88. doi: 10.1109/TIE.1987.350929
[9] Atlam, H. F., Attiya, G., & El-Fishawy, N. (2013). Comparative study on cbir based on color feature. International journal of computer applications, 78(16).
[10] Mahamuni, C. V., & Wagh, N. B. (2017). Study of cbir methods for retrieval of digital images based on colour and texture extraction. 2017 International Conference on Computer Communication and Informatics (ICCCI), 1-7. doi: 10.1109/ICCCI.2017.8117784
[11] Stricker, M. A., & Orengo, M. (1995). Similarity of color images. Storage and retrieval for image and video databases iii. doi: 10.1117/12.205308
[12] Shao, H., Wu, Y., Cui, W., & Zhang, J. . (2008). Image retrieval based on mpeg-7 dominant color descriptor. The 9th international conference for young computer scientists, 753-757. doi: 10.1109/icycs.2008.89
[13] Pass, G., Zabih, R., & Miller, J. (1997). Comparing images using color coherence vectors. The fourth acm international conference on multimedia, 65-73. doi: 10.1145/244130.244148
[14] Haralick, R. M., Shanmugam, K., & Dinstein, I. H. . (1973). Textural features for image classification. Ieee transactions on systems, man, and cybernetics, SMC-3(6), 610 - 621. doi: 10.1109/TSMC.1973.4309314
[15] Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture measures with classification based on feature distributions. Pattern Recognition, 29(1), 51-59. doi: Doi 10.1016/0031-3203(95)00067-4
[16] Papakostas, G. A., Koulouriotis, D. E., & Tourassis, V. D. (2012). Feature extraction based on wavelet moments and moment invariants in machine vision systems. Human-centric machine vision, 31. doi: 10.5772/33141
[17] Tamura, H., Shunji Mori, and Takashi Yamawaki. (1978). Textural features corresponding to visual perception. Ieee transactions on systems, man, and cybernetics, 8(6), 460-473. doi: 10.1109/TSMC.1978.4309999
[18] Fogel, I., & Sagi, D. (1989). Gabor filters as texture discriminator. Biological cybernetics, 61, 103-113.
[19] Belkasim, S. O., Shridhar, M., & Ahmadi, M. . (1991). Pattern recognition with moment invariants: A comparative study and new results. Pattern recognitton, 24(12), Pages 1117-1138. doi: 10.1016/0031-3203(91)90140-Z
[20] O'Hara, S., & Draper, B. A. . (2011). Introduction to the bag of features paradigm for image classification and retrieval. Computer vision and pattern recognition.
[21] Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. 2006 Ieee computer society conference on computer vision and pattern recognition (CVPR'06), 2, 2169-2178. doi: 10.1109/CVPR.2006.68
[22] Deole, P. A., & Longadge, R. . (2014). Content based image retrieval using color feature extraction with knn classification. International journal of computer science and mobile computing.
[23] Harris, C., & Stephens, M. (1988). A combined corner and edge detector. Alvey vision conference, 15(50), 10-5244. doi: 10.5244/c.2.23
[24] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91-110. doi: https://doi.org/10.1023/B:VISI.0000029664.99615.94
[25] Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (surf). Computer vision and image understanding, 110(3), 346-359. doi: https://doi.org/10.1016/j.cviu.2007.09.014
[26] Rublee, E., et al. (2011). Orb: an efficient alternative to sift or surf. 2011 International conference on computer vision. doi: 10.1109/ICCV.2011.6126544
[27] Girshick, R., et al. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. The ieee conference on computer vision and pattern recognition (cvpr), 508-587.
[28] Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2012). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154-171.
[29] Girshick, R. (2015). Fast r-cnn. the IEEE international conference on computer vision., 1440-1448.
[30] Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 91-99.
[31] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. The ieee conference on computer vision and pattern recognition, 779-788.
[32] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. European conference on computer vision, 21-37. doi: https://doi.org/10.1007/978-3-319-46448-0_2
[33] 政府開放平台. (檢視日期: 2021-06-24). 全國5大超商資料集. 資料集列表. from https://data.gov.tw/dataset/32086
[34] Wikipidia: Great-circle distance. (檢視日期: 2021-07-02). from https://en.wikipedia.org/wiki/Great-circle_distance
[35] Chakravarti, R., & Meng, X. (2009). A study of color histogram based image retrieval. 2009 sixth international conference on information technology: new generations, 1323-1328. doi: 10.1109/ITNG.2009.126
[36] Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover's distance as a metric for image retrieval. International Journal of Computer Vision, 40, 99-121. doi: https://doi.org/10.1023/A:1026543900054
[37] Yu, X., Yu, Z., Pang, W., Li, M., & Wu, L. (2018). An improved emd-based dissimilarity metric for unsupervised linear subspace learning. Complexity, 2018, 1-24. doi: https://doi.org/10.1155/2018/8917393
[38] Karami, E., Prasad, S., & Shehata, M. (2017). Image matching using sift, surf, brief and orb: performance comparison for distorted images. arXiv:1710.02726, 2017.
[39] Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
[40] Wang, C. Y., Bochkovskiy, A., & Liao, H. Y. M. (2021). Scaled-yolov4: scaling cross stage partial network. The ieee/cvf conference on computer vision and pattern recognition, 13029-13038.
[41] Ebner, M. (2007). Color constancy (pp. 103-112): Wiley.
[42] Gonzalez, R. C., & Woods, R. E. (2018). Digital image processing, 4th edition (pp. 926): Pearson.
[43] Roboflow, I. (檢視日期: 2021-07-03). Roboflow Web. from https://roboflow.com/

全文公開日期 2024/08/26 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文