簡易檢索 / 詳目顯示

研究生: 朱柏誠
Bo-Cheng Zhu
論文名稱: 以數據增強法改善深度學習於小物件偵測之正確性:以鳥類偵測為例
Improvement of the Accuracy of Deep Learning in Small Object Detection through Data Augmentation: Bird Detection as an Example
指導教授: 林宗翰
Tzung-Han Lin
口試委員: 羅梅君
Mei-Chun Lo
陳怡永
Yi-Yung Chen
孫沛立
Pei-Li Sun
林宗翰
Tzung-Han Lin
學位類別: 碩士
Master
系所名稱: 應用科技學院 - 色彩與照明科技研究所
Graduate Institute of Color and Illumination Technology
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 94
中文關鍵詞: 深度學習物件辨識YOLO數據增強禽流感
外文關鍵詞: Deep Learning, Object Detection, You Only Look Once (YOLO), Data Augmentation, Avian Influenza
相關次數: 點閱:669下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

摘要
近年來人工智慧領域的發展相當迅速且全面,尤其是以神經網路架構為主的深度學習領域最為熱門。其中深度學習物件辨識領域的辨識準確度逐年上升,已經能夠應用於現實生活。但是針對小物件方面的準確度卻仍舊低下。至2020年中為止,在公開數據集挑戰賽,對於小物件辨識最佳方法其準確度仍舊難以突破50%。因此本研究的目的在於提高深度學習的物件辨識領域中,對於小物件辨識的準確度。本研究對象為鳥類偵測,利用鳥類偵測應用於驅趕野鳥。
本研究分別基於小物件偵測數據增強法、課程學習、困難反例探勘,這三種數據增強方法改進和融合,並用於改善模型辨識小物件的準確度。於實驗中,驗證不同方法對相同模型的辨識準確度之改進效果。本研究以鳥類偵測為例驗證對於真實環境下此方法依舊有效,並結合開發的硬體設備,於飼養場域中驅趕野鳥,以此預防家禽感染禽流感。
本研究共進行兩項主要實驗。實驗一:改進小物件偵測數據增強法於模型準確度與實際場域之應用分析,以及實驗二:改進課程學習結合困難反例探勘於模型準確度與實際場域之應用分析。針對改善模型辨識小物件準確度的實驗結果顯示,本研究提出改善小物件辨識的兩種方法對於原始模型於小物件辨識準確度都有大幅度的進步。原始模型於小物件數據集中的準確度為18%,而實驗一中達到24%的偵測效果,而實驗二於實驗一之上又提升偵測效果達到30.48%;此外實驗一中,使用本研究所開發的實驗設備結合改進的數據增強方法於真實場域的應用效果較差,僅有13.22%的驅逐效果,而實驗二所改進的數據增強方法結合實驗設備於真實場域的應用效果大幅領先於實驗一,驅逐效果達到27.70%。


Abstract
In recent years, the field of artificial intelligence is growing quickly and comprehensively. Particularly, deep learning, based on neural network architecture, is the most popular technology. In object recognition field of deep learning, the recognition accuracy is increasing year by year which is appliable to real life. However, the recognition accuracy for small object is still relatively low. Until June 2020, the recognition accuraucy for small object, in the open datasets challenge, was unlikely to be more than 50%. Therefore, the purpose of this study was to improve the recognition accuracy of small objects in this field. The objective of this study was to apply in bird detection, which is used to drive away wild birds.
This study utilized the algorithms of augmentation for small object detection, curriculum learning and hard negative mining, that are three famous data augmentation methods. Those methods were improved, fused, and used again to improve the recognition accuracy for small objects. The improvement of the recognition accuracy of the same model by different methods was verified in experiments. Then, these methods were applied to real-life situations to verify that the method was still effective, such as bird detection, and in integration with the hardware equipment to drive away birds and to prevent the poultry of being infected with avian influenza.
Two experiments were conducted in this study. 1) Experiment 1: improving augmentation for small object detection in model accuracy and real-field application analysis, and 2) Experiment 2: improving and fusing curriculum learning and hard negative mining for model accuracy and real-world application analysis. The results of experiments showed that two proposed methods in this study had significant improvements in the recognition accuracy for small objects. The recognition accuracy without improvement was 18%. By constract, our detection accuracy in experiment 1 was as good as 24%, and in experiment 2 was up to 30.48%. In addition, by integration of the experimental laser equipment based on Experiment 1, the effect rate of driving bird away was less than 13.22%. Based on the same laser equipment, the effect rate in Experiment 2 was significantly improved to be 27.70%.

目錄 摘要 I Abstract II 誌謝 IV 目錄 V 圖目錄 VIII 表目錄 X 第1章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 3 1.3 論文架構 4 第2章 文獻探討 5 2.1 物件偵測(Object Detection) 5 2.1.1 模板匹配(Template Matching) 5 2.1.2 邊緣偵測(Edge Detection) 5 2.1.3 特徵點偵測(Feature Detection) 6 2.1.4 色彩偵測(Color Detection) 6 2.2 機器學習(Machine Learning) 6 2.2.1 監督式學習(Supervised Learning) 7 2.2.2 非監督式學習(Unsupervised Learning) 7 2.2.3 強化學習(Reinforcement Learning) 8 2.2.4 深度學習(Deep Learning) 9 2.2.5 數據增強(Data Augmentation) 9 2.3 預防禽流感(Avian Influenza) 10 2.3.1 禽流感發源歷史 11 2.3.2 杜絕禽流感之方法 11 2.3.3 使用情景 12 第3章 研究方法與實驗設計 14 3.1 You Only Look Once(YOLO)演算法 14 3.1.1 One Stage & Two Stage 14 3.1.2 YOLOv1 16 3.1.3 YOLOv2 22 3.1.4 YOLOv3 31 3.2 硬體架構 35 3.2.1 TX2的硬體架構 36 3.2.2 樹莓派的硬體架構 37 3.3 實驗一:改進增強小物件偵測於模型準確度與實際場域之應用分析 38 3.3.1 實驗軟體設計 38 3.3.2 實驗硬體設計 42 3.3.3 實驗流程 44 3.4 實驗二:改進課程學習結合困難反例探勘於模型準確度與實際場域之應用分析 48 3.4.1 實驗軟體設計 48 3.4.2 實驗硬體設計 52 3.4.3 實驗流程 52 第4章 實驗結果分析與比較 57 4.1 數據增強方法對模型準確度變化 57 4.1.1 增強小物件偵測 57 4.1.2 課程學習和困難反例探勘 58 4.2 分析方法 59 4.3 實驗結果 60 4.3.1 實驗一實驗結果 61 4.3.2 實驗二實驗結果 63 第5章 結論及未來研究方向 66 參考文獻 67 附錄 73

[1] Y. Liu et al., “A Deep Learning System for Differential Diagnosis of Skin Diseases,” Nature Medicine, vol. 26, no. 6, pp. 900–908, Jun.2020, doi: 10.1038/s41591-020-0842-3.
[2] 行政院農業委員會動植物防疫檢疫局(2020,6月24日),「國際高病原性家禽流行性感冒(HPAI)疫情現況」,行政院農業委員會動植物防疫檢疫局,2020。
[3] Y. Poovorawan, S. Pyungporn, S. Prachayangprecha and J. Makkoch, “Global Alert to Avian Influenza Virus Infection: From H5N1 to H7N9,” Pathogens and Global Health, vol. 107, no. 5, pp. 217–223, Jul. 2013, doi: 10.1179/2047773213Y.0000000103.
[4] H. Penz, I. Bajla, K. Mayer and W. Krattenthaler, “High-speed Template Matching with Point Correlation in Image Pyramids,” in Diagnostic Imaging Technologies and Industrial Applications, Sep. 1999, vol. 3827, no. September 1999, pp. 85–94, doi: 10.1117/12.361005.
[5] L. Ma, Y. Sun, N. Feng and Z. Liu, “Image Fast Template Matching Algorithm Based on Projection and Sequential Similarity Detecting,” in 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Sep. 2009, pp. 957–960, doi: 10.1109/IIH-MSP.2009.94.
[6] I. Sobel and G. Feldman, “An Isotropic 3x3 Image Gradient Operator,” Stanford Artificial Intelligence Project (SAIL), no. June, pp. 271–272, 2015, doi: 10.13140/RG.2.1.1912.4965.
[7] J. Canny, “A Computational Approach to Edge Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986, doi: 10.1109/TPAMI.1986.4767851.
[8] C. Harris and M. Stephens, “A Combined Corner and Edge Detector,” in Procedings of the Alvey Vision Conference 1988, 1988, vol. 69, pp. 23.1–23.6, doi: 10.5244/C.2.23.
[9] D. G. Lowe, “Object Recognition from Local Scale-invariant Features,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, pp. 1150–1157 vol.2, doi: 10.1109/ICCV.1999.790410.
[10] H. Bay, T. Tuytelaars and L. Van Gool, “SURF: Speeded Up Robust Features,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3951 LNCS, A. Leonardis, H. Bischof and A. Pinz, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 404–417.
[11] W. Ouyang and X. Wang, “Joint Deep Learning for Pedestrian Detection,” in 2013 IEEE International Conference on Computer Vision, Dec. 2013, pp. 2056–2063, doi: 10.1109/ICCV.2013.257.
[12] H.-A. Park, “An Introduction to Logistic Regression: From Basic Concepts to Interpretation with Particular Attention to Nursing Domain,” Journal of Korean Academy of Nursing, vol. 43, no. 2, p. 154, Apr. 2013, doi: 10.4040/jkan.2013.43.2.154.
[13] K. Fukushima, “Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,” Biological Cybernetics, vol. 36, no. 4, pp. 193–202, Apr. 1980, doi: 10.1007/BF00344251.
[14] A. Sherstinsky, “Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network,” Physica D: Nonlinear Phenomena, vol. 404, no. March, p. 132306, Mar. 2020, doi: 10.1016/j.physd.2019.132306.
[15] D. Kang, J. Heo, B. Kang, and D. Nam, “Pupil detection and Tracking for AR 3D under Various Circumstances,” Electronic Imaging, vol. 2019, no. 15, pp. 55–1–55–5, Jan. 2019, doi: 10.2352/ISSN.2470-1173.2019.15.AVM-055.
[16] Q. Xie, M.-T. Luong, E. Hovy and Q. V. Le, “Self-training with Noisy Student Improves ImageNet Classification,” in The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10687–10698.
[17] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research, vol. 15. pp. 1929–1958, 2014.
[18] E. D. Cubuk, B. Zoph, J. Shlens and Q. V. Le, “Randaugment: Practical Automated Data Augmentation with a Reduced Search Space,” in The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 702–703.
[19] B. Lupiani and S. M. Reddy, “The History of Avian Influenza,” Comparative Immunology, Microbiology and Infectious Diseases, vol. 32, no. 4, pp. 311–323, Jul. 2009, doi: 10.1016/j.cimid.2008.01.004.
[20] D.E. Swayne and D.L. Suarez, “Review: Highly Pathogenic Avian Influenza.,” Revue Scientifique et Technique Office International des Epizooties, vol. 19, no. 2, pp. 463–482, 2000.
[21] H. Yu-Lung and H. Kao-Pin, “Human Avian Influenza: Past and Present (H5N1, H7N9, and H10N8),” 感染控制雜誌, vol. 25, no. 2, pp. 69–76, 2015, doi: 10.6526/ICJ.2015.203.
[22] P. Xie et al., “Control of Bird Feeding Behavior by Tannin1 through Modulating the Biosynthesis of Polyphenols and Fatty Acid-Derived Volatiles in Sorghum,” Molecular Plant, vol. 12, no. 10, pp. 1315–1324, Oct. 2019, doi: 10.1016/j.molp.2019.08.004.
[23] R. N. Brown, D. H. Brown and R. Gheshm, “Laser Scarecrow Technology for Prevention of Bird Damage,” HortScience, vol.54, no. 9, pp. S421–S421, 2019.
[24] R. Girshick, J. Donahue, T. Darrell and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp. 580–587, doi: 10.1109/CVPR.2014.81.
[25] S. Ren, K. He, R. Girshick and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, Jun. 2017, doi: 10.1109/TPAMI.2016.2577031.
[26] K. He, G. Gkioxari, P. Dollar and R. Girshick, “Mask R-CNN,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, vol. 42, no. 2, pp. 2980–2988, doi: 10.1109/ICCV.2017.322.
[27] W. Liu et al., “SSD: Single Shot MultiBox Detector,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9905 LNCS, 2016, pp. 21–37.
[28] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-time Object Detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 779–788, doi: 10.1109/CVPR.2016.91.
[29] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 6517–6525, doi: 10.1109/CVPR.2017.690.
[30] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Apr. 2018.
[31] N. Bodla, B. Singh, R. Chellappa and L. S. Davis, “Soft-NMS — Improving Object Detection with One Line of Code,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, vol. 2017-Octob, pp. 5562–5570, doi: 10.1109/ICCV.2017.593.
[32] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn and A. Zisserman, “The Pascal Visual Object Classes Challenge: A Retrospective,” International Journal of Computer Vision, vol. 111, no. 1, pp. 98–136, Jan. 2015, doi: 10.1007/s11263-014-0733-5.
[33] T.-Y. Lin et al., “Microsoft COCO: Common Objects in Context,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8693 LNCS, no. PART 5, 2014, pp. 740–755.
[34] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in 32nd International Conference on Machine Learning, ICML 2015, 2015, vol. 1, pp. 448–456.
[35] M. Lin, Q. Chen and S. Yan, “Network in network,” in 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings, Dec. 2014, pp. 1–10.
[36] K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, vol. 45, no. 8, pp. 770–778, doi: 10.1109/CVPR.2016.90.
[37] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan and S. Belongie, “Feature Pyramid Networks for Object Detection,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 936–944, doi: 10.1109/CVPR.2017.106.
[38] M. Kisantal, Z. Wojna, J. Murawski, J. Naruniec and K. Cho, “Augmentation for small object detection,” in 9th International Conference on Advances in Computing and Information Technology (ACITY 2019), Dec. 2019, pp. 119–133, doi: 10.5121/csit.2019.91713.
[39] T.-Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollar, “Focal Loss for Dense Object Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 318–327, Feb. 2020, doi: 10.1109/TPAMI.2018.2858826.
[40] P. Welinder, S. Branson, T. Mita, C. Wah and F. Schroff, “Caltech-UCSD Birds 200,” Caltech-UCSD Tech. Rep., vol. 200, pp. 1–15, 2010, doi: CNS-TR-2010-001.
[41] Y. Bengio, J. Louradour, R. Collobert and J. Weston, “Curriculum learning,” in Proceedings of the 26th International Conference On Machine Learning, ICML 2009, 2009, pp. 41–48.
[42] P. F. Felzenszwalb, R. B. Girshick, D. McAllester and D. Ramanan, “Object Detection with Discriminatively Trained Part-Based Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627–1645, Sep. 2010, doi: 10.1109/TPAMI.2009.167.
[43] K.-K. Sung and T. Poggio, “Example-based learning for view-based human face detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 39–51, 1998, doi: 10.1109/34.655648.

無法下載圖示 全文公開日期 2025/08/25 (校內網路)
全文公開日期 2025/08/25 (校外網路)
全文公開日期 2025/08/25 (國家圖書館:臺灣博碩士論文系統)
QR CODE