簡易檢索 / 詳目顯示

研究生: 余孟霖
Meng-Lin Yu
論文名稱: Machine learning-based system for PCCES project data auto-correction
Machine learning-based system for PCCES project data auto-correction
指導教授: 蔡孟涵
Meng-Han Tsai
口試委員: 郭榮欽
謝佑明
蔡明達
學位類別: 碩士
Master
系所名稱: 工程學院 - 營建工程系
Department of Civil and Construction Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 63
中文關鍵詞: 自然語言處理詞嵌入
外文關鍵詞: Natural Language Processing, Word Embedding
相關次數: 點閱:190下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  • According to the administrative order of the Taiwan government, public construction projects with a budget of more than NTD$ ten million should use public construction cost estimation system (PCCES) to formulate project budgets and the PCCES coding accuracy rate should be at least 40%. Although PCCES plays an important role in Taiwan's public construction projects, it is full of incorrect data due to different user habits, time-consuming system operation, and long system operation pretraining, etc. Therefore, this study proposed a natural language processing (NLP) and machine learning-based text classification system, an automatic correction system (ACS), to correct the wrong data in PCCES automatically. The ACS has two significant features: data correction which converts unstructured data into structured data; a recommendation function which provides users a recommendation list for manual data correction. In ACS, a language model was trained based on machine learning techniques that could deal with natural languages. With this model, the system can classify user input and then return answers based on the classification results. The ACS utilized the continuous bag of words (CBOW) model to find the most similar data. Subsequently, the ACS replaces the user input with the obtained result. Users can also correct data manually by ACS’s recommendation list based on the results. For implementation, the developed system was used to correct the real construction data in PCCES’s database. The results show that the system can correct 18,511 pieces of data with an accuracy of 76%. Additionally, this study conducted a user test by asking the subject to conduct same tasks by using both ACS and PCCES. The results showed that the recommendation function could reduce the operation time by 51.69% as compared with the original system. By these tests, the system was validated to be able to shorten the user's operating time and eliminate the impact of the user’s familiarity with PCCES.

    Abstract iv Acknowledgements v Table of Contents vi List of Figures viii List of Tables ix 1. Introduction 1 1.1 Data management 1 1.2 Public construction cost estimation system 2 1.3 Benefits and challenges of the PCCES 5 2. Literature review 7 2.1 Challenges in construction data management 7 2.2 Unstructured data processing 9 2.3 Machine learning-based methods 11 3. Objective 12 4. Methodology 13 4.1 System overview 13 4.2 Data processing module 14 4.2.1 Raw data collection 15 4.2.2 Text processor 15 4.3 Search processing module 16 4.3.1 Word embedding 17 4.3.2 Similarity calculation 19 4.4 Mapping process module 20 5. Implementation 21 5.1 Training data 21 5.1.1 System manuals 21 5.1.2 Blank valuations 24 5.2 Module implementation 25 5.2.1 Text processing 25 5.2.2 Database 27 5.2.3 Model training 29 5.2.4 Searching 30 5.2.5 Mapping 31 6. Validation 32 6.1 System evaluation 32 6.1.1 Data source 33 6.1.2 Classification 34 6.1.3 Result and discussion 34 6.2 User test 35 6.2.1 Users 36 6.2.2 Test scenario 36 6.2.3 Result and discussion 38 7. Discussion 42 7.1. Contributions 42 7.2. Limitations 45 8. Conclusion 46 References 47

    Alsubaey, M., Asadi, A. and Makatsoris, H. 2015. “A Naïve Bayes Approach for EWS Detection by Text Mining of Unstructured Data: A Construction Project Case.” In IntelliSys 2015 - Proceedings of 2015 SAI Intelligent Systems Conference, London, United Kingdom, 10-11 Nov 2015 :164-168. IEEE. doi:10.1109/IntelliSys.2015.7361140.
    Aziz, R. F., and Hafez, S. M. 2013. “Applying Lean Thinking in Construction and Performance Improvement.” Alexandria Engineering Journal 52 (4): 679-695. doi:10.1016/j.aej.2013.04.008.
    Basu, T., and Murthy, C. A. 2012. “Effective Text Classification by a Supervised Feature Selection Approach.” In 2012 IEEE 12th International Conference on Data Mining Workshops, Brussels, Belgium, 10 Dec 2012:918-925. IEEE. doi:10.1109/ICDMW.2012.45.
    Caldas, C. H., and Soibelman, L. 2003. “Automating Hierarchical Document Classification for Construction Management Information Systems.” Automation in Construction 12 (4): 395-406. doi:10.1016/S0926-5805(03)00004-9.
    Charette, R. P., and Marshall, H. E. 1999. “UNIFORMAT II Elemental Classification for Building Specifications, Cost Estimating, and Cost Analysis.” Cost Estimating, and Cost Analysis, NISTIR 6389: 103. http://www.fire.nist.gov/bfrlpubs/build99/art080.html.
    Chen, H. M., Schütz, R., Kazman, R., and Matthes, F. 2016. “Amazon in the Air: Innovating with Big Data at Lufthansa.” In 2016 49th Hawaii International Conference on System Sciences (HICSS) , Koloa, Hawaii, USA, 5-8 Jan 2016:5096-5105. IEEE. doi:10.1109/HICSS.2016.631.
    Chen, Y. 2013. “Combination of Public Construction Coding System and Building Information Modeling for Budget Estimate.” National Taiwan University. https://hdl.handle.net/11296/khec2y.
    Construction Specifications Institute. 2008. “CSI MasterFormat 2008. ” https://www.csiresources.org/home.
    Davatzikos, C., Ruparel, K., Fan, Y., Shen, D.G., Acharyya, M., Loughead, J.W., Gur, R.C. and Langlebenb, D.D. 2005. “Classifying Spatial Patterns of Brain Activity with Machine Learning Methods: Application to Lie Detection.” NeuroImage 28 (3): 663-668. doi:10.1016/j.neuroimage.2005.08.009.
    Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. 2018. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1 (October), Minneapolis, Minnesota, USA, 2-7 Jun 2019 . http://arxiv.org/abs/1810.04805.
    Farhadloo, M., Patterson, R. A., and Rolland, E. 2016. “Modeling Customer Satisfaction from Unstructured Data Using a Bayesian Approach.” Decision Support Systems 90 (October): 1-11. doi:10.1016/j.dss.2016.06.010.
    Globerson, A., Chechik, G., Pereira, F., and Tishby, N. 2007. “Euclidean Embedding of Co-Occurrence Data.” Journal of Machine Learning Research. https://www.jmlr.org/papers/v8/globerson07a.html.
    Hao, L., and Hao, L. 2008. “Automatic Identification of Stop Words in Chinese Text Classification.” In 2008 International Conference on Computer Science and Software Engineering, Wuhan, China, 12-14 Dec 2008:718-722. IEEE. doi:10.1109/CSSE.2008.829.
    Hasan, M., Islam, I., and Hasan, K. A. 2019. “Sentiment Analysis Using Out of Core Learning.” In 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox's Bazar, Bangladesh, 07-09 Feb 2019:1-6. IEEE. doi:10.1109/ECACE.2019.8679298.
    Hatamian, M., Hung, D., Kurmas, Z., Frenzel, J., Pinter-Lucke, J., andZhao, P. 2016. “In Praise of Digital Design and Computer Architecture.” In Digital Design and Computer Architecture, i-ii. Elsevier. doi:10.1016/B978-0-12-800056-4.00022-4.
    Huang, C., Yang, I., Wang, C., and Wu, M.. 2017. “A Study of Introducing Omniclass on BIM-Based Building Design Checking”. https://www.grb.gov.tw/search/planDetail?id=12066805.
    Ioannou, P. G., and Liu, L. Y. 1993. “Advanced Construction Technology System—ACTS.” Journal of Construction Engineering and Management 119 (2). American Society of Civil Engineers: 288-306. doi:10.1061/(ASCE)0733-9364(1993)119:2(288).
    Kharrazi, H., Anzaldi, L. J., Hernandez, L., Davison, A., Boyd, C. M., Leff, B., ... and Weiner, J. P. 2018. “The Value of Unstructured Electronic Health Record Data in Geriatric Syndrome Case Identification.” Journal of the American Geriatrics Society 66 (8). Wiley Online Library: 1499-1507. doi:10.1111/jgs.15411.
    Kim, K., Chung, B. S., Choi, Y., Lee, S., Jung, J. Y., and Park, J. 2014. “Language Independent Semantic Kernels for Short-Text Classification.” Expert Systems with Applications 41 (2): 735-743. doi:10.1016/j.eswa.2013.07.097.
    Kim, T., and Chi, S. 2019. “Accident Case Retrieval and Analyses: Using Natural Language Processing in the Construction Industry.” Journal of Construction Engineering and Management 145 (3): 04019004. doi:10.1061/(ASCE)CO.1943-7862.0001625.
    Krasnopolsky, V. M., and Fox-Rabinovitz, M. S. 2006. “Complex Hybrid Models Combining Deterministic and Machine Learning Components for Numerical Climate Modeling and Weather Prediction.” Neural Networks 19 (2): 122-134. doi:10.1016/j.neunet.2006.01.002.
    Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2017.“ImageNet Classification with Deep Convolutional Neural Networks.” Communications of the ACM 60 (6): 84-90. doi:10.1145/3065386.
    Kulkarni, A., and Shivananda, A. 2019. “Converting Text to Features.” In Natural Language Processing Recipes, 67-96. Berkeley, CA: Apress. doi:10.1007/978-1-4842-4267-4_3.
    Lebret, R., and Collobert, R. 2013. “Word Emdeddings through Hellinger PCA.” Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, December. Stroudsburg, PA, USA: Association for Computational Linguistics, Gothenburg, Sweden, 26-30 Apr 2014:482-490. doi:10.3115/v1/E14-1051.
    O. Levy and Y. Goldberg. 2014. “Neural Word Embedding as Implicit Matrix Factorization.” In Advances in Neural Information Processing Systems. Montreal, Canada , 8-13 Dec 2014:2177-2185 http://papers.nips.cc/paper/5477-neural-word-embedding-as.
    Li, W., Zhu, L., Guo, K., Shi, Y., and Zheng, Y. 2018. “Build a Tourism-Specific Sentiment Lexicon Via Word2vec.” Annals of Data Science 5 (1): 1-7. doi:10.1007/s40745-017-0130-3.
    Li, Y., Xu, L., Tian, F., Jiang, L., Zhong, X., and Chen, E. 2015. “Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective.” In IJCAI International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25-31 July 2015.:3650-3656. https://www.ijcai.org/Proceedings/15/Papers/513.pdf.
    Liao, M. 2011. “Study of Current State of Software for Drafting and Estimating in Construction.” National Kaohsiung University of Science and Technology. https://hdl.handle.net/11296/w4h5xd.
    Lin, Z., and Yen, T. 2014. “Study on Budget Rationality and Supervision Practice of Construction Safety and Health. ” https://www.grb.gov.tw/search/planDetail?id=8390698.
    Lu, D., and Q.Weng. 2007. “A Survey of Image Classification Methods and Techniques for Improving Classification Performance.” International Journal of Remote Sensing 28 (5): 823-870. doi:10.1080/01431160600746456.
    Lu, J., Yang, J., Batra, D., and Parikh, D. 2018. “Neural Baby Talk.” In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 18-23 Jun 2018:7219-7228. IEEE. doi:10.1109/CVPR.2018.00754.
    Luo, L., Li, L., Hu, J., Wang, X., Hou, B., Zhang, T., and Zhao, L. P. 2016. “A Hybrid Solution for Extracting Structured Medical Information from Unstructured Data in Medical Records via a Double-Reading/Entry System.” BMC Medical Informatics and Decision Making 16 (1): 114. doi:10.1186/s12911-016-0357-5.
    Manning, C., Raghavan, P., and Schütze, H. 2009. “IEEE Photonics Technology Letters Information for Authors.” IEEE Photonics Technology Letters 21 (8): C3-C3. doi:10.1109/LPT.2009.2020494.
    Mao, W., Zhu, Y., and Ahmad, I. 2007. “Applying Metadata Models to Unstructured Content of Construction Documents: A View-Based Approach.” Automation in Construction 16 (2): 242-252. doi:10.1016/j.autcon.2006.05.005.
    Mikolov, T., Chen, K., Corrado, G., and Dean, J. 2013. “Efficient Estimation of Word Representations in Vector Space.” In 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings. Scottsdale, Arizona, USA, 2-4 May 2013. http://arxiv.org/abs/1301.3781.
    Nandhakumar, N., Sherkat, E., Milios, E. E., Gu, H., and Butler, M. 2017. “Clinically Significant Information Extraction from Radiology Reports.” In Proceedings of the 2017 ACM Symposium on Document Engineering - DocEng ’17, Valletta, Malta, 4-7 Sep 2017:153-162. doi:10.1145/3103010.3103023.
    Navarro, P. J., Fernandez, C., Borraz, R., and Alonso, D. 2016. “A Machine Learning Approach to Pedestrian Detection for Autonomous Vehicles Using High-Definition 3D Range Data.” Sensors 17 (12): 18. doi:10.3390/s17010018.
    Norman, E. S., Brotherton, S. A., and Fried, R. T. 2008. Book Title: Work Breakdown Structures. Work Breakdown Structures: The Foundation for Project Management Excellence. Hoboken, NJ, USA: John Wiley and Sons, Inc. doi:10.1002/9780470432723.
    Construction Specifications Institute. 2006. "OmniClass."
    Public Construction Commission. 2020a. “Turnkey Cases That Already Been Awarded between December 1, 2017, and June 30, 2020.” (In Chinese) https://pcces2.pcc.gov.tw/PCC_MRP/Announcement/AnnDetail/9dec7761-dc2c-4dad-8cd4-be61fa44e5a7.
    Public Construction Commission. 2020b. “Public Construction Cost Estimation System.” https://pcces.pcc.gov.tw/CSInew/Default.aspx?FunID=Fun_12& SearchType=E.
    “Python.” 2020. https://www.python.org/.
    “Python Software Foundation.” 2020. https://www.python.org/psf/.
    Qureshi, M. A., and Greene, D. 2019. “EVE: Explainable Vector Based Embedding Technique Using Wikipedia.” Journal of Intelligent Information Systems 53 (1): 137-165. doi:10.1007/s10844-018-0511-x.
    Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. 2020. “(GPT-2) Language Models Are Unsupervised Multitask Learners.” OpenAI Blog 1 (8).
    Russell, A. D., Chiu, C. Y., and Korde, T. 2009. “Visual Representation of Construction Management Data.” Automation in Construction 18 (8): 1045-1062. doi:10.1016/j.autcon.2009.05.006.
    Rusu, O., Halcu, I., Grigoriu, O., Neculoiu, G., Sandulescu, V., Marinescu, M., and Marinescu, V. 2013. “Converting Unstructured and Semi-Structured Data into Knowledge.” In 2013 11th RoEduNet International Conference, Sinaia, Romania, 17-19 Jan 2013:1-4. IEEE. doi:10.1109/RoEduNet.2013.6511736.
    Sainath, T. N., Weiss, R. J., Senior, A., Wilson, K. W., and Vinyals, O. 2015. “Learning the Speech Front-End with Raw Waveform CLDNNs.” In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Dresden, Germany, 6-10 Sep 2015:1-5. https://www.isca-speech.org/archive/interspeech_2015/i15_0001.html.
    Sidorov, G., Gelbukh, A., Gómez-Adorno, H., and Pinto, D. 2014. “Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model.” Computación y Sistemas 18 (3). doi:10.13053/cys-18-3-2043.
    Sint, R., Schaffert, S., Stroka, S., and Ferstl, R. 2009. “Combining Unstructured, Fully Structured and Semi-Structured Information in Semantic Wikis.” In CEUR Workshop Proceedings, Heraklion, Crete, Greece, 31 May-Jun 4 2009:73-87.
    Soibelman, L., Wu, J., Caldas, C., Brilakis, I., and Lin, K. Y. 2008. “Management and Analysis of Unstructured Construction Data Types.” Advanced Engineering Informatics 22 (1). Elsevier: 15-27. doi:10.1016/j.aei.2007.08.011.
    Wu, P. H., Yu, A., Tsai, C. W., Koh, J. L., Kuo, C. C., and Chen, A. L. 2020. “Keyword Extraction and Structuralization of Medical Reports.” Health Information Science and Systems 8 (1): 18. doi:10.1007/s13755-020-00108-6.
    Yin, Z., and Shen, Y. 2018. “On the Dimensionality of Word Embedding.” In Advances in Neural Information Processing Systems, Montreal, CANADA , 2-8 Dec 2018:887-898. http://papers.nips.cc/paper/7368-on-the-dimensionality-of-word-embedd.
    Yu, M., Chan, H., and Tsai, M. 2019. “NLP-Based Method for Auto-Correcting Public Constructions Data.” In 2019 4th International Conference on Civil and Building Engineering Informatics. 6-9 Nov 2019, Sendai, Miyagi, Japan. https://iccbei2019.com/.
    Yu, S. 2004. “Evaluation of the Public Works Procedure Efficiency of E-Procurement of Government of Kaohsiung.” National Sun Yat-sen University. https://hdl.handle.net/11296/w4h5xd.

    無法下載圖示 全文公開日期 2025/08/25 (校內網路)
    全文公開日期 2025/08/25 (校外網路)
    全文公開日期 2025/08/25 (國家圖書館:臺灣博碩士論文系統)
    QR CODE