簡易檢索 / 詳目顯示

研究生: Denny Kusoemo
Denny Kusoemo
論文名稱: Text Mining-based Construction Site Accident Classification Using Hybrid Supervised Machine Learning Model
Text Mining-based Construction Site Accident Classification Using Hybrid Supervised Machine Learning Model
指導教授: 鄭明淵
Min-Yuan Cheng
口試委員: 呂守陞
Lu-Shou Sheng
曾仁杰
Ceng-Ren Jie
高明秀
Gao-Ming Xiu
學位類別: 碩士
Master
系所名稱: 工程學院 - 營建工程系
Department of Civil and Construction Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 79
中文關鍵詞: Construction Project SafetyConstruction Safety DocumentGated Recurrent UnitGRUSOSClassification of Accidents Cause
外文關鍵詞: Construction Project Safety, Construction Safety Document, Gated Recurrent Unit, GRU, SOS, Classification of Accidents Cause
相關次數: 點閱:289下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Construction project safety performance is a major concern in the construction industry. Accidents in the construction project not only caused severe health issues but also led to huge financial losses. These accidents are usually documented in a form of accident narratives that consist of accident summary and cause classification. While documenting hundreds of these accident narratives may need vast resources and efforts, the implementation of AI model is considered as one favorable solution to this particular classification problem. Nevertheless, previously implemented models still have a room of improvement in terms of model performance. For instance, Decision Tree, KNN, Naïve Bayesian, SVM, and LR are categorized as weak learner where they display a substantial error rate. In this regard, this study proposed a hybrid model between Gated Recurrent Unit (GRU) and Symbiotic Organisms Search (SOS), named Symbiotic Gated Recurrent Unit (SGRU). SOS algorithm searches GRU best parameters to ensure the optimal performance of the corresponding model. Furthermore, the proposed model is applied and evaluated on real construction project accident narrative as a case study. The experimental results in this study demonstrated a promising performance of SGRU on classifying accidents causes. By providing notable classification performance as well as outperforming other applied AI model, SGRU demonstrated the capability to aid prevention strategies development for future use in the construction industry.


    Construction project safety performance is a major concern in the construction industry. Accidents in the construction project not only caused severe health issues but also led to huge financial losses. These accidents are usually documented in a form of accident narratives that consist of accident summary and cause classification. While documenting hundreds of these accident narratives may need vast resources and efforts, the implementation of AI model is considered as one favorable solution to this particular classification problem. Nevertheless, previously implemented models still have a room of improvement in terms of model performance. For instance, Decision Tree, KNN, Naïve Bayesian, SVM, and LR are categorized as weak learner where they display a substantial error rate. In this regard, this study proposed a hybrid model between Gated Recurrent Unit (GRU) and Symbiotic Organisms Search (SOS), named Symbiotic Gated Recurrent Unit (SGRU). SOS algorithm searches GRU best parameters to ensure the optimal performance of the corresponding model. Furthermore, the proposed model is applied and evaluated on real construction project accident narrative as a case study. The experimental results in this study demonstrated a promising performance of SGRU on classifying accidents causes. By providing notable classification performance as well as outperforming other applied AI model, SGRU demonstrated the capability to aid prevention strategies development for future use in the construction industry.

    ABSTRACT i ACKNOWLEDGEMENT ii TABLE OF CONTENTS iv ABBREVIATIONS AND SYMBOLS vii LIST OF FIGURES xi LIST OF TABLES xii CHAPTER 1: INTRODUCTION 1 1.1 Research Background 1 1.2 Research Objective 4 1.3 Research Scope and Assumption 4 1.4 Research Methodology 5 1.5 Research Outline 8 CHAPTER 2: LITERATURE REVIEW 9 2.1 Related Works of Construction Accident Narrative Classification 9 2.2 Natural Language Processing (NLP) 11 2.2.1 Natural Language Toolkit (NLTK) 11 2.2.2 English Stop Words 11 2.2.3 Tokenization 11 2.2.4 Global Vector for Word Representation (GloVe) 12 2.3 Gated Recurrent Unit (GRU) 15 2.4 Adaptive Moment Estimation (ADAM) 18 2.5 Dropout 19 2.6 Symbiotic Organisms Search (SOS) 20 CHAPTER 3: MODEL CONSTRUCTION 23 3.1 Overview of Text Mining-Based Symbiotic Gated Recurrent Unit (SGRU) Model 23 3.1.1 Natural Language Processing (NLP) Phase 25 3.1.2 Symbiotic Organism Search – Gated Recurrent Unit (SGRU) Classifier Phase …………………………………………………………………………………28 3.2 Performance Evaluation Criteria 31 CHAPTER 4: MODEL ESTIMATION AND VALIDATION 33 4.1 Data Collection 33 4.2 Data Preparation 36 4.2.1 Text Cleaning Process 36 4.2.2 Stop Words Removal Process 36 4.2.3 Tokenization Process 41 4.2.4 Data Division Process 42 4.2.5 Embedding Process 43 4.3 SGRU Evaluation and Implementation 44 4.3.1 SGRU Implementation Result 44 4.3.2 Result Comparison with other AI Techniques 49 CHAPTER 5: CONCLUSIONS AND RECOMMENDATIONS 56 5.1 Conclusions 56 5.2 Recommendations 57 APPENDIX 60 A-1 OSHA dataset (Goh & Ubeynarayana, 2017)) 60

    Breiman, L. (2017). Classification and regression trees: Routledge.
    Chen, L., Vallmuur, K., & Nayak, R. (2015). Injury narrative text classification using factorization model. BMC Medical Informatics and Decision Making, 15(1), S5. doi:10.1186/1472-6947-15-S1-S5
    Cheng, M.-Y., & Prayogo, D. (2014). Symbiotic organisms search: a new metaheuristic optimization algorithm. Computers & Structures, 139, 98-112.
    Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
    Chokor, A., Naganathan, H., Chong, W. K., & Asmar, M. E. (2016). Analyzing Arizona OSHA Injury Reports Using Unsupervised Machine Learning. Procedia Engineering, 145, 1588-1593. doi:https://doi.org/10.1016/j.proeng.2016.04.200
    Dasarathy, B. V. (1991). Nearest neighbor ({NN}) norms:{NN} pattern classification techniques.
    Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res., 12, 2121-2159.
    Goh, Y. M., & Ubeynarayana, C. U. (2017). Construction accident narrative classification: An evaluation of text mining techniques. Accident Analysis & Prevention, 108, 122-130. doi:https://doi.org/10.1016/j.aap.2017.08.026
    Guggilla, C., Miller, T., & Gurevych, I. (2016). CNN-and LSTM-based claim classification in online user comments. Paper presented at the Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers.
    Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
    Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398): John Wiley & Sons.
    Khurana, D., Koli, A., Khatter, K., & Singh, S. (2017). Natural language processing: State of the art, current trends and challenges. arXiv preprint arXiv:1708.05148.
    Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering, 160, 3-24.
    Poh, C. Q. X., Ubeynarayana, C. U., & Goh, Y. M. (2018). Safety leading indicators for construction sites: A machine learning approach. Automation in Construction, 93, 375-386. doi:https://doi.org/10.1016/j.autcon.2018.03.022
    Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach: Malaysia; Pearson Education Limited.
    Shen, G., Tan, Q., Zhang, H., Zeng, P., & Xu, J. (2018). Deep learning with gated recurrent unit networks for financial sequence predictions. Procedia computer science, 131, 895-903.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.

    Tieleman, T., & Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2), 26-31.
    Tixier, A. J. P., Hallowell, M. R., Rajagopalan, B., & Bowman, D. (2016). Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports. Automation in Construction, 62, 45-56. doi:https://doi.org/10.1016/j.autcon.2015.11.001
    Ubeynarayana, C., & Goh, Y. (2017). An Ensemble Approach for Classification of Accident Narratives.
    Vapnik, V., Golowich, S. E., & Smola, A. J. (1997). Support vector method for function approximation, regression estimation and signal processing. Paper presented at the Advances in neural information processing systems.
    Zhang, F., Fleyeh, H., Wang, X., & Lu, M. (2019). Construction site accident analysis using text mining and natural language processing techniques. Automation in Construction, 99, 238-248. doi:https://doi.org/10.1016/j.autcon.2018.12.016

    無法下載圖示 全文公開日期 2022/08/20 (校內網路)
    全文公開日期 2024/08/20 (校外網路)
    全文公開日期 2024/08/20 (國家圖書館:臺灣博碩士論文系統)
    QR CODE