研究生: |
許書豪 Shu-Hao Xu |
---|---|
論文名稱: |
工業安全之整合深度模型警示系統 Industrial Safety Integrated Deep Model Warning System |
指導教授: |
花凱龍
Kai-Lung Hua |
口試委員: |
鄭文皇
Weng-Huang Cheng 陳永耀 Yung-Yao Chen 陳宜惠 Yi-Hui Chen 孫士韋 Shih-Wei Sun |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 40 |
中文關鍵詞: | 單目深度預測 、單物件追蹤 |
外文關鍵詞: | Monocular Depth Estimation, Single Object Tracking |
相關次數: | 點閱:406 下載:10 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著世界工業的高度發展,各種工安突發事件往往源自於操作人員的視野死角所導致;為了使操作員能有效觀察視覺死角並即時偵測危害情境,本計畫首先透過機械遙控吊車模擬實際環境,並部屬外部設備 (例如:攝影機, 藍芽模組) 來達成電子柵欄之功能。本計畫將採用景深估計模型以及物件偵測模型混合來達成電子柵欄之功能,能夠對可能發生之危險互動提出警示,藉此降低工安事故的發生。本計畫的目標是利用深度學習開發景深估計及物件偵測模型,自動偵測周遭危機情況並適時給予警報,使人員能在更加安全之環境內工作。
With the rapid development of the world’s industry, various industrial safety emergencies are often caused by the blind angle of the operator’s vision; in order to enable the operator to effectively observe the blind angle of vision and detect the dangerous situation in real time, this thesis first uses a me- chanical remote control crane Simulate the actual environment and deploy external devices (such as cameras, Bluetooth modules) to achieve the func- tion of electronic fences. This thesis will use a combination of depth esti- mation model and object detection model to achieve the function of elec- tronic fence, which can warn of possible dangerous interactions, thereby reducing the occurrence of industrial safety accidents. The goal of this the- sis is to use deep learning to develop depth estimation and object detection models, automatically detect surrounding crisis situations and give timely alerts, so that personnel can work in a safer environment.
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural
networks,” Advances in neural information processing systems, vol. 25, 2012.
[2] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale
deep network,” arXiv preprint arXiv:1406.2283, 2014.
[3] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth es-
timation: Mixing datasets for zero-shot cross-dataset transfer,” IEEE transactions on pattern analysis
and machine intelligence, 2020.
[4] R. Ranftl, A. Bochkovskiy, and V. Koltun, “Vision transformers for dense prediction,” in Proceedings
of the IEEE/CVF International Conference on Computer Vision, pp. 12179–12188, 2021.
[5] S. F. Bhat, I. Alhashim, and P. Wonka, “Adabins: Depth estimation using adaptive bins,” in Pro-
ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4009–4018,
2021.
[6] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale
deep network,” Advances in neural information processing systems, vol. 27, 2014.
[7] J. Li, R. Klein, and A. Yao, “A two-streamed network for estimating fine-scaled depth maps from
single rgb images,” arXiv preprint arXiv:1607.00730, 2016.
[8] F. Liu, C. Shen, G. Lin, and I. Reid, “Learning depth from single monocular images using deep con-
volutional neural fields,” IEEE transactions on pattern analysis and machine intelligence, vol. 38,
no. 10, pp. 2024–2039, 2015.
[9] P. Wang, X. Shen, Z. Lin, S. Cohen, B. Price, and A. L. Yuille, “Towards unified depth and semantic
prediction from a single image,” in Proceedings of the IEEE conference on computer vision and pattern
recognition, pp. 2800–2809, 2015.
[10] A. Roy and S. Todorovic, “Monocular depth estimation using neural regression forest,” in Proceedings
of the IEEE conference on computer vision and pattern recognition, pp. 5506–5514, 2016.
[11] S. Kim, K. Park, K. Sohn, and S. Lin, “Unified depth prediction and intrinsic image decomposition
from a single image via joint convolutional neural fields,” in European conference on computer vision,
pp. 143–159, Springer, 2016.
[12] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with
fully convolutional residual networks,” in 2016 Fourth international conference on 3D vision (3DV),
pp. 239–248, IEEE, 2016.
27
[13] N. Wang, W. Zhou, J. Wang, and H. Li, “Transformer meets tracker: Exploiting temporal context for
robust visual tracking,” in The IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 2021.
[14] B. Shuai, A. Berneshawi, X. Li, D. Modolo, and J. Tighe, “Siammot: Siamese multi-object tracking,”
in CVPR, 2021.
[15] J. F. H. A. . V. Luca Bertinetto, Jack Valmadre and P. H. S. Torr., “Fully-convolutional siamese
networks for object tracking.,” ECCVW, 2016.
[16] L. V. G. Goutam Bhat, Martin Danelljan and R. Timofte., “Learning discriminative model prediction
for tracking.,” ICCV, 2019.
[17] T. W. Z. Z. Han Cai, Chuang Gan and S. Han., “Once-for-all: Train one network and specialize it for
efficient deployment.,” ICLR, 2019.
[18] J. W. Xin Chen, Lingxi Xie and Q. Tian., “Progressive differentiable architecture search: Bridging the
depth gap between search and evaluation.,” ICCV, 2019.
[19] X. Z. G. M. X. X. Yukang Chen, Tong Yang and J. S. D. B. search for object detection., “Detnas:
Backbone search for object detection.,” NIPS, 2019.
[20] G. L. S. Z. Zedu Chen, Bineng Zhong and R. Ji., “Siamese box adaptive network for visual tracking.,”
CVPR, 2020.
[21] D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-
scale convolutional architecture,” in Proceedings of the IEEE international conference on computer
vision, pp. 2650–2658, 2015.
[22] J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in
International Conference on 3D Vision (3DV), 2017.
[23] J. H. Lee, M.-K. Han, D. W. Ko, and I. H. Suh, “From big to small: Multi-scale local planar guidance
for monocular depth estimation,” arXiv preprint arXiv:1907.10326, 2019.
[24] B. Yan, H. Peng, K. Wu, D. Wang, J. Fu, and H. Lu, “Lighttrack: Finding lightweight neural networks
for object tracking via one-shot architecture search,” arXiv preprint arXiv:2104.14545, 2021.
[25] A. L. e. a. Matej Kristan, Jiri Matas, “The seventh visual object tracking vot2019 challenge results.,”
ICCVW, 2019.
[26] B. Z. V. V. Gabriel Bender, Pieter-Jan Kindermans and Q. Le., “Understanding and simplifying one-
shot architecture search,” ICML, 2018.
[27] H. M. W. H. Z. L. Y. W. Zichao Guo, Xiangyu Zhang and J. Sun., “Single path one-shot neural
architecture search with uniform sampling.,” ECCV, 2020.
[28] B. Z. Q. L. Hieu Pham, Melody Guan and J. Dean., “Efficient neural architecture search via parameters
sharing.,” ICML, 2018.
28
[29] L. Xie and A. Yuille, “Genetic cnn,” ICCV, 2017.
[30] B. Zoph and Q. V. Le., “Neural architecture search with reinforcement learning.,” ICLR, 2017.
[31] L. Li and A. Talwalkar., “Random search and reproducibility for neural architecture search.,” UAI,
2019.
[32] B. Z. Q. L. Hieu Pham, Melody Guan and J. Dean., “Efficient neural architecture search via parameters
sharing.,” ICML, 2018.
[33] I. K. K. M. L.-C. Chen, G. Papandreou and A. L. Yuille., “Deeplab: Semantic image segmentation
with deep convolutional nets, atrous convolution, and fully connected crfs.,” IEEE, 2018.
[34] P. F. Olaf Ronneberger and T. Brox, “Unet: Convolutional networks for biomedical image segmenta-
tion.,” MICCAI, 2015.
[35] J. L. Evan Shelhamer and T. Darrell., “Fully convolutional networks for semantic segmentation.,”
CVPR, 2015.
[36] T. C. B. J. C. D. Y. Z. D. L. Y. M. M. T. X. W. W. L. Jingdong Wang, Ke Sun and B. Xiao., “Deep
high-resolution representation learning for visual recognition.,” TPAMI, 2020.
[37] F. Yu and V. Koltun., “Multi-scale context aggregation by dilated convolutions.,” ICLR, 2016.
[38] X. C. Yuhui Yuan and J. Wang., “Objectcontextual representations for semantic segmentation.,”
ECCV, 2020.
[39] X. Q. X. W. Hengshuang Zhao, Jianping Shi and J. Jia., “Pyramid scene parsing network.,” CVPR,
2017.
[40] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani,
M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image
recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.