簡易檢索 / 詳目顯示

研究生: 賈鎮東
Zhen-Dong Jia
論文名稱: 物件偵測在Android手機裝置上之應用
The Object Detection Application on Android Mobile Phone
指導教授: 洪西進
Shi-Jinn Horng
口試委員: 洪西進
Shi-Jinn Horng
范欽雄
Chin-Shyurng Fahn
顏成安
Thompson Yen
吳金雄
Chin-Hsiung Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 39
中文關鍵詞: 物件偵測深度學習卷積神經網路行動裝置
外文關鍵詞: Object Detection, Deep Learning, Convolutional Neural Network, Mobile Device
相關次數: 點閱:192下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Android系統是一個以Linux為基礎的半開放原始碼作業系統,主要用於行動端設備,屬於開發性系統,自由度高,機體相容性廣泛,提供完善的開發環境,支援各種先進的繪圖、網路、相機等處理能力,近年來市場佔有率逐漸提高,Android的應用需求也越來越大。而本論文將物件偵測應用於Android系統上,好處在於現今越來越多裝置採用此作業系統,因此可被使用的範圍隨之變廣,由於Android APP相容性高,Android系統的裝置皆可使用本論文所提出的系統。
    本文設計實作了一套完整的行動端物件偵測系統,特別針對深度神經網路的小型化及快速計算。對一種目前主流的SSD物件偵測演算法進行改造,令它可以即時運行在手機端。對於模型大小,使用一種小型化網路MoblieNet替換原有的VGG16,將整個模型大大縮小。對於模型精度,我們使用帶孔卷積金字塔多尺度卷積特徵融合的方法進行優化。在Android上實現了一個Demo應用,從鏡頭獲取場景,對其中的物件進行即時偵測。


    Android is a Linux-based, semi-open source operation system, mainly used for mobile devices. This is a development system with high degrees of freedom and wide compatibility. It provides a complete development environment and supports advanced drawing, processing power of the Internet, camera etc. The market share has gradually increased in recent years, and the application requirements of Android are also growing. In this paper, the object detection is applied to the Android system. The advantage of this development in Android is that more and more devices adopt this operating system, so the field that can be used in becomes wider, because of the high compatibility. The system proposed in this paper can be implemented on Android system devices.
    This paper designs and implements a complete object detection system on mobile phone, especially for the miniaturization and fast calculation of neural networks. A current mainstream object detection algorithm SSD is modified so that it can run on the mobile terminal instantly. For the model size, the original VGG16 network was replaced with a small network named MoblieNet, which greatly reduced the overall model size. For better accuracy, we use the method of Atrous Spatial Pyramid Pooling to concatenate and optimize feature map. The demo on Android is able to capture scenes from the camera and instantly detect objects from them.

    第一章 緒論 1 1.1 研究動機與目的 1 1.2 論文章節安排 2 第二章 相關研究 3 2.1 文獻探討 3 第三章 物件偵測 9 3.1 SSD物件偵測演算法 9 3.1.1 Prior Box的設定 10 3.1.2 損失函數Loss Function 11 3.1.3 網路架構 12 3.2 網路模型小型化 13 3.2.1 特殊網路結構MobileNet 14 第四章 行動端物件偵測系統設計與優化 16 4.1 小型網路MobileNet 16 4.1 SSD的不足 17 4.2 基於帶孔卷積的特徵融合 17 4.3 多尺度特徵金字塔 19 4.4 實驗說明 20 4.4.1 運行環境 20 4.4.2 COCO Dataset簡介 20 4.5 TensorFlow Lite(Android) 22 4.6 系統效能 24 4.6.1 實驗結果分析 24 第五章 結論 30 5.1 研究成果 30 5.1 未來展望 30 參考文獻 32

    [1] Krizhevsky, A., Sutskever, I., & Hinton, G. E., "Imagenet classification with deep convolutional neural networks", 2012 Advances in neural information processing systems, pp. 1097-1105, 2012.
    [2] ImageNet Large Scale Visual Recognition Challenge http://www.image-net.org/challenges/LSVRC/
    [3] The PASCAL Visual Object Classes http://host.robots.ox.ac.uk/pascal/VOC/
    [4] ImageNet dataset http://image-net.org/
    [5] COCO dataset http://cocodataset.org/#home
    [6] Girshick, R., Donahue, J., Darrell, T. and Malik, J., "Rich feature hierarchies for accurate object detection and semantic segmentation." 2014 Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587. 2014.
    [7] He, K., Zhang, X., Ren, S. and Sun, J., "Spatial pyramid pooling in deep convolutional networks for visual recognition." IEEE transactions on pattern analysis and machine intelligence 37, no. 9 (2015): 1904-1916.
    [8] Girshick, R., "Fast r-cnn." 2015 Proceedings of the IEEE international conference on computer vision, pp. 1440-1448. 2015.
    [9] Ren, S., He, K., Girshick, R. and Sun, J., "Faster r-cnn: Towards real-time object detection with region proposal networks." 2015 Advances in neural information processing systems, pp. 91-99. 2015.
    [10] Dai, J., Li, Y., He, K. and Sun, J., "R-fcn: Object detection via region-based fully convolutional networks." 2016 Advances in neural information processing systems, pp. 379-387. 2016.
    [11] Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R. and LeCun, Y., "Overfeat: Integrated recognition, localization and detection using convolutional networks." arXiv preprint arXiv:1312.6229 (2013).
    [12] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A., "You only look once: Unified, real-time object detection." 2016 Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788. 2016.
    [13] Redmon, J. and Farhadi, A., "YOLO9000: better, faster, stronger." 2017 Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263-7271. 2017.
    [14] Redmon, J. and Farhadi, A., "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
    [15] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y. and Berg, A.C., "Ssd: Single shot multibox detector." 2016 European conference on computer vision, pp. 21-37. Springer, Cham, 2016.
    [16] Lin, T.Y., Goyal, P., Girshick, R., He, K. and Dollár, P., "Focal loss for dense object detection." 2017 Proceedings of the IEEE international conference on computer vision, pp. 2980-2988. 2017.
    [17] Simonyan, K. and Zisserman, A., "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
    [18] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M. and Adam, H., "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).
    [19] Zhao, H., Shi, J., Qi, X., Wang, X. and Jia, J., "Pyramid scene parsing network." 2017 Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881-2890. 2017.
    [20] Yu, F. and Koltun, V., "Multi-scale context aggregation by dilated convolutions." arXiv preprint arXiv:1511.07122 (2015).
    [21] Chen, L.C., Papandreou, G., Schroff, F. and Adam, H., "Rethinking atrous convolution for semantic image segmentation." arXiv preprint arXiv:1706.05587 (2017).
    [22] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A., "Going deeper with convolutions." 2015 Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9. 2015.
    [23] Ioffe, S. and Szegedy, C., "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).
    [24] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z., "Rethinking the inception architecture for computer vision." 2016 Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818-2826. 2016.
    [25] Szegedy, C., Ioffe, S., Vanhoucke, V. and Alemi, A.A., "Inception-v4, inception-resnet and the impact of residual connections on learning." 2017 Thirty-First AAAI Conference on Artificial Intelligence. 2017.

    QR CODE