Basic Search / Detailed Display

Author: 許書豪
Shu-Hao Xu
Thesis Title: 工業安全之整合深度模型警示系統
Industrial Safety Integrated Deep Model Warning System
Advisor: 花凱龍
Kai-Lung Hua
Committee: 鄭文皇
Weng-Huang Cheng
陳永耀
Yung-Yao Chen
陳宜惠
Yi-Hui Chen
孫士韋
Shih-Wei Sun
Degree: 碩士
Master
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2023
Graduation Academic Year: 111
Language: 中文
Pages: 40
Keywords (in Chinese): 單目深度預測單物件追蹤
Keywords (in other languages): Monocular Depth Estimation, Single Object Tracking
Reference times: Clicks: 216Downloads: 5
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 隨著世界工業的高度發展,各種工安突發事件往往源自於操作人員的視野死角所導致;為了使操作員能有效觀察視覺死角並即時偵測危害情境,本計畫首先透過機械遙控吊車模擬實際環境,並部屬外部設備 (例如:攝影機, 藍芽模組) 來達成電子柵欄之功能。本計畫將採用景深估計模型以及物件偵測模型混合來達成電子柵欄之功能,能夠對可能發生之危險互動提出警示,藉此降低工安事故的發生。本計畫的目標是利用深度學習開發景深估計及物件偵測模型,自動偵測周遭危機情況並適時給予警報,使人員能在更加安全之環境內工作。


    With the rapid development of the world’s industry, various industrial safety emergencies are often caused by the blind angle of the operator’s vision; in order to enable the operator to effectively observe the blind angle of vision and detect the dangerous situation in real time, this thesis first uses a me- chanical remote control crane Simulate the actual environment and deploy external devices (such as cameras, Bluetooth modules) to achieve the func- tion of electronic fences. This thesis will use a combination of depth esti- mation model and object detection model to achieve the function of elec- tronic fence, which can warn of possible dangerous interactions, thereby reducing the occurrence of industrial safety accidents. The goal of this the- sis is to use deep learning to develop depth estimation and object detection models, automatically detect surrounding crisis situations and give timely alerts, so that personnel can work in a safer environment.

    目錄 論文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv 目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 圖目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 表目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 緒論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 機械遙控吊車模擬實際環境 . . . . . . . . . . . . . . . . . 3 2.2 深度估計模組 . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 物件追蹤模組 . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 嵌入式系統 . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 研究方法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1 部屬外部設備 . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 單物件追蹤模組 - LightTrack . . . . . . . . . . . . . . . . 10 3.3 深度估計模組 - DPT-Hybrid . . . . . . . . . . . . . . . . 12 3.4 深度估計模組 - BTS . . . . . . . . . . . . . . . . . . . . . 14 3.5 警示系統模組 . . . . . . . . . . . . . . . . . . . . . . . . 16 4 實驗結果與分析 . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1 KITTI Dataset . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 不同裝置下深度模型 FPS 的比較 . . . . . . . . . . . . . . 22 4.3 運行警示模型相關實驗成果 . . . . . . . . . . . . . . . . . 23 4.4 失敗案例 . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5 結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 參考文獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural
    networks,” Advances in neural information processing systems, vol. 25, 2012.
    [2] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale
    deep network,” arXiv preprint arXiv:1406.2283, 2014.
    [3] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth es-
    timation: Mixing datasets for zero-shot cross-dataset transfer,” IEEE transactions on pattern analysis
    and machine intelligence, 2020.
    [4] R. Ranftl, A. Bochkovskiy, and V. Koltun, “Vision transformers for dense prediction,” in Proceedings
    of the IEEE/CVF International Conference on Computer Vision, pp. 12179–12188, 2021.
    [5] S. F. Bhat, I. Alhashim, and P. Wonka, “Adabins: Depth estimation using adaptive bins,” in Pro-
    ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4009–4018,
    2021.
    [6] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale
    deep network,” Advances in neural information processing systems, vol. 27, 2014.
    [7] J. Li, R. Klein, and A. Yao, “A two-streamed network for estimating fine-scaled depth maps from
    single rgb images,” arXiv preprint arXiv:1607.00730, 2016.
    [8] F. Liu, C. Shen, G. Lin, and I. Reid, “Learning depth from single monocular images using deep con-
    volutional neural fields,” IEEE transactions on pattern analysis and machine intelligence, vol. 38,
    no. 10, pp. 2024–2039, 2015.
    [9] P. Wang, X. Shen, Z. Lin, S. Cohen, B. Price, and A. L. Yuille, “Towards unified depth and semantic
    prediction from a single image,” in Proceedings of the IEEE conference on computer vision and pattern
    recognition, pp. 2800–2809, 2015.
    [10] A. Roy and S. Todorovic, “Monocular depth estimation using neural regression forest,” in Proceedings
    of the IEEE conference on computer vision and pattern recognition, pp. 5506–5514, 2016.
    [11] S. Kim, K. Park, K. Sohn, and S. Lin, “Unified depth prediction and intrinsic image decomposition
    from a single image via joint convolutional neural fields,” in European conference on computer vision,
    pp. 143–159, Springer, 2016.
    [12] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with
    fully convolutional residual networks,” in 2016 Fourth international conference on 3D vision (3DV),
    pp. 239–248, IEEE, 2016.
    27
    [13] N. Wang, W. Zhou, J. Wang, and H. Li, “Transformer meets tracker: Exploiting temporal context for
    robust visual tracking,” in The IEEE/CVF Conference on Computer Vision and Pattern Recognition
    (CVPR), 2021.
    [14] B. Shuai, A. Berneshawi, X. Li, D. Modolo, and J. Tighe, “Siammot: Siamese multi-object tracking,”
    in CVPR, 2021.
    [15] J. F. H. A. ￿. V. Luca Bertinetto, Jack Valmadre and P. H. S. Torr., “Fully-convolutional siamese
    networks for object tracking.,” ECCVW, 2016.
    [16] L. V. G. Goutam Bhat, Martin Danelljan and R. Timofte., “Learning discriminative model prediction
    for tracking.,” ICCV, 2019.
    [17] T. W. Z. Z. Han Cai, Chuang Gan and S. Han., “Once-for-all: Train one network and specialize it for
    efficient deployment.,” ICLR, 2019.
    [18] J. W. Xin Chen, Lingxi Xie and Q. Tian., “Progressive differentiable architecture search: Bridging the
    depth gap between search and evaluation.,” ICCV, 2019.
    [19] X. Z. G. M. X. X. Yukang Chen, Tong Yang and J. S. D. B. search for object detection., “Detnas:
    Backbone search for object detection.,” NIPS, 2019.
    [20] G. L. S. Z. Zedu Chen, Bineng Zhong and R. Ji., “Siamese box adaptive network for visual tracking.,”
    CVPR, 2020.
    [21] D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-
    scale convolutional architecture,” in Proceedings of the IEEE international conference on computer
    vision, pp. 2650–2658, 2015.
    [22] J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in
    International Conference on 3D Vision (3DV), 2017.
    [23] J. H. Lee, M.-K. Han, D. W. Ko, and I. H. Suh, “From big to small: Multi-scale local planar guidance
    for monocular depth estimation,” arXiv preprint arXiv:1907.10326, 2019.
    [24] B. Yan, H. Peng, K. Wu, D. Wang, J. Fu, and H. Lu, “Lighttrack: Finding lightweight neural networks
    for object tracking via one-shot architecture search,” arXiv preprint arXiv:2104.14545, 2021.
    [25] A. L. e. a. Matej Kristan, Jiri Matas, “The seventh visual object tracking vot2019 challenge results.,”
    ICCVW, 2019.
    [26] B. Z. V. V. Gabriel Bender, Pieter-Jan Kindermans and Q. Le., “Understanding and simplifying one-
    shot architecture search,” ICML, 2018.
    [27] H. M. W. H. Z. L. Y. W. Zichao Guo, Xiangyu Zhang and J. Sun., “Single path one-shot neural
    architecture search with uniform sampling.,” ECCV, 2020.
    [28] B. Z. Q. L. Hieu Pham, Melody Guan and J. Dean., “Efficient neural architecture search via parameters
    sharing.,” ICML, 2018.
    28
    [29] L. Xie and A. Yuille, “Genetic cnn,” ICCV, 2017.
    [30] B. Zoph and Q. V. Le., “Neural architecture search with reinforcement learning.,” ICLR, 2017.
    [31] L. Li and A. Talwalkar., “Random search and reproducibility for neural architecture search.,” UAI,
    2019.
    [32] B. Z. Q. L. Hieu Pham, Melody Guan and J. Dean., “Efficient neural architecture search via parameters
    sharing.,” ICML, 2018.
    [33] I. K. K. M. L.-C. Chen, G. Papandreou and A. L. Yuille., “Deeplab: Semantic image segmentation
    with deep convolutional nets, atrous convolution, and fully connected crfs.,” IEEE, 2018.
    [34] P. F. Olaf Ronneberger and T. Brox, “Unet: Convolutional networks for biomedical image segmenta-
    tion.,” MICCAI, 2015.
    [35] J. L. Evan Shelhamer and T. Darrell., “Fully convolutional networks for semantic segmentation.,”
    CVPR, 2015.
    [36] T. C. B. J. C. D. Y. Z. D. L. Y. M. M. T. X. W. W. L. Jingdong Wang, Ke Sun and B. Xiao., “Deep
    high-resolution representation learning for visual recognition.,” TPAMI, 2020.
    [37] F. Yu and V. Koltun., “Multi-scale context aggregation by dilated convolutions.,” ICLR, 2016.
    [38] X. C. Yuhui Yuan and J. Wang., “Objectcontextual representations for semantic segmentation.,”
    ECCV, 2020.
    [39] X. Q. X. W. Hengshuang Zhao, Jianping Shi and J. Jia., “Pyramid scene parsing network.,” CVPR,
    2017.
    [40] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani,
    M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image
    recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.

    QR CODE