簡易檢索 / 詳目顯示

研究生: 張鈺輝
Yui-Hui Chang
論文名稱: 距離夾角最似鄰近結合貝氏定理預測分類推論模式之研究
Distance and Cosine Angle-Based K-nearest neighbor Classification with Bayesian Framework
指導教授: 鄭明淵
Min-Yuan Cheng
口試委員: 潘南飛
Pan, Nang-Fei
陳柏翰
Po-Han Chen
陳鴻銘
Hung-Ming Chen
學位類別: 碩士
Master
系所名稱: 工程學院 - 營建工程系
Department of Civil and Construction Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 130
中文關鍵詞: 分類分析K-NN ClassifierBayesian Theory餘弦相似度
外文關鍵詞: Classification Analysis, K-NN Classifier, Bayesian Theory, Cosine Similarity
相關次數: 點閱:199下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在資料探勘領域中,分類分析(classifier analysis)為資料預測處理的重要方法,而在分類分析中的最常使用的方法之一,為K-Nearest Neighbor演算法。K-Nearest Neighbor分類法是以歐幾里得距離(Euclidean distance)作為分類依據,雖然使用歐幾里得距離能夠體現個體數值特徵的絕對差異,但如果發生距離非常接近甚至相同時,可能產生難以判斷分類的結果。此外一般分類法呈現結果之方式為直接定義測試資料隸屬為某類,但測試資料與其他類別也有相似之處,其中有機率是屬於另一類之結果,但因為分類之方式將測試資料歸類,無法表現隸屬程度。除了歐幾里得距離,餘弦相似度(cosine similarity)也是常被採用的度量方法之一,當使用餘弦相似度來衡量資料間相似度的大小時,由於餘弦相似度只單獨從方向性上區分差異,對於絕對數值並不敏感,仍有其盲點與缺陷存在。
    因此,本研究希望針對上述不足之處進行探討,希望改善K-NN Classifier及分類法後預測之方式,探討歐幾里得距離與餘弦相似度兩種衡量方法的特性,使分類演算法同時將兩者作為分類依據,發展出θ-Means Nearest Neighbor Classifier演算法。此外本研究也將θ-MNN結合Bayesian theory 模式發展θ-Means Nearest Bayesian Classifier演算法使得分類結果能夠提供更詳細的資訊。本研究將利用θ-MNBC演算法與θ-MNN Classifier演算法進行公共工程爭議處理案例與道路邊坡崩塌案例進行分類與分析。


    In the field of data mining, Classification analysis is an important method of prediction for data processing. One of the most commonly method in the classification analysis is K-Nearest Neighbor Classifier. K-Nearest Neighbor classification method is based on the Euclidean distance as a classification basis, even though using Euclidean distance can show the absolute difference between data. But the data could be very close or even have the same distance, may cause K-NN classifier difficult to classify. In addition general classification method is define testing data to certain category directly, but testing data could be also similarities with other categories. Which it has a chance to belong to another kind of result. Except the Euclidean distance, cosine similarity is often used as measure. When using the cosine similarity to measure the similarity between data, due to cosine similarity only distinguish the difference from the direction, the absolute values are not sensitive, and it still has its blind spots and defects.
    Thus, this research is to explore the problems and improve K-NN classifier. Combining Euclidean distance and cosine similarity, and make them as measure. Innovation and development of θ-Means Nearest Neighbor Classifier algorithms. In addition, this study will also combine θ-MNN classifier and Bayesian theory to develop θ-Means Nearest Bayesian Classifier algorithm makes the classification result can tell more detailed information.
    This research will use project dispute resolution cases and slope collapse cases to verify the classification model.

    摘要i Abstractii 誌 謝iii 目錄v 圖目錄viii 表目錄x 第一章緒論1 1.1 研究背景與動機1 1.2 研究目的4 1.3 研究範圍與限制5 1.4 研究流程與方法6 1.5 論文架構8 第二章文獻回顧10 2.1 資料探勘10 2.1.1 資料探勘的功能12 2.2 分類分析14 2.3 K-Nearest Neighbor演算法15 2.3.1 K-Nearest Neighbor 演算法步驟16 2.3.2 K-Nearest Neighbor演算法之特性19 2.4 K-Nearest Neighbor Density Estimation21 2.4.1 K-NN Density Estimation演算法步驟23 2.5 度量方法介紹24 2.6 Bayesian Framework for Classification26 2.6.1 Bayes classifier演算法公式27 第三章θ-MNN演算法28 3.1 歐幾里得距離之問題探討28 3.2 餘弦相似度法之問題探討29 3.3 K-NN Classifier問題探討29 3.4 θ-Means Nearest Neighbor演算法31 3.4.1 θ-MNN演算法步驟32 3.5 θ-MNN演算法特性35 第四章θ-MNBC 演算法38 4.1 θ-MNN Classifier問題探討38 4.2 θ-MNBC演算法38 4.2.1 θ-MNBC演算法步驟40 第五章案例測試與分析42 5.1 道路邊坡崩塌案例42 5.1.1 案例蒐集與案例庫建立43 5.2 公共工程爭議處理案例55 5.3 θ-MNN Classifier案例測試與分析61 5.3.1 θ-MNN 演算法-道路邊坡崩塌案例62 5.3.2 θ-MNN Classifier演算法-公共工程爭議處理案例71 5.4 θ-MNBC案例測試與分析85 5.4.1 θ-MNBC演算法-道路邊坡崩塌案例86 5.4.2 θ-MNBC演算法-公共工程爭議處理案例89 第六章結論與建議93 6.1 結論93 6.2 建議94 參考文獻95 附錄98

    1.Han, Jiawei, Micheline Kamber, and Jian Pei. Data mining: concepts and techniques. Morgan kaufmann, 2006.

    2.Fayyad, Usama, Gregory Piatetsky-Shapiro, and Padhraic Smyth. "From data mining to knowledge discovery in databases." AI magazine 17.3 (1996): 37.

    3.Scott, D. W. (1992). Multivariate density estimation theory, practice, and
    visualization, Wiley.

    4.Kung, Y.-H., Lin, P.-S., and Kao, C.-H. (2012). “An optimal –nearest
    neighbor for density estimation.” Stat. Probabil. Lett., 82(10), 1786–1791.

    5.Mack, Y. P., and Rosenblatt, M. (1979). “Multivariate k-nearest neighbor
    density estimates.” J. Multivar. Anal., 9(1), 1–15.

    6.Ouadah, S. (2013). “Uniform-in-bandwidth nearest-neighbor density estimation.”Stat. Probab. Lett., 83(8), 1835–1843.

    7.Theodoridis, S., and Koutroumbas, K. (2009). Pattern recognition,
    Academic Press, Elsevier.

    8.Bishop, C. (2006). Pattern recognition and machine learning, Springer
    Science+Business Media, Singapore.

    9.Duda, R. O., Hart, P. E., and Stock, D. G. (2001). Pattern classification,
    2nd Ed., Wiley.

    10.Gnardellis, T., and B. Boutsinas. "On experimenting with data mining in education." Paper preseted at the 2ο Πανελλήνιο Συνέδριο µε ιεθνή Συµµετοχή(2001).

    11.Berry, Michael J., and Gordon Linoff. Data mining techniques: for marketing, sales, and customer support. John Wiley & Sons, Inc., 1997.

    12.Cabena, P. et. al., Discovering Data Mining: From Concept to Implementation, Prentice Hall, 1997.

    13.Kleissner, Charly. "Data mining for the enterprise." System Sciences, 1998., Proceedings of the Thirty-First Hawaii International Conference on. Vol. 7. IEEE, 1998.

    14.Shaw, Michael J., et al. "Knowledge management and data mining for marketing." Decision support systems 31.1 (2001): 127-137.

    15.Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In International joint conference on artificial intelligence (Vol. 2, pp. 1137–1143): Morgan Kaufmann.

    16.Cheng, M. and Hoang, N. (2014). "Slope Collapse Prediction Using Bayesian Framework with K-Nearest Neighbor Density Estimation: Case Study in Taiwan." J. Comput. Civ. Eng. , 10.1061/(ASCE)CP.1943-5487.0000456 , 04014116.

    17.毛國君,段立娟,王實與石雲,「數據挖掘原理與算法」清華大學出版社有限公司,2005,160-163。

    18.Salton, Gerard, and Michael E. Lesk. "Computer evaluation of indexing and text processing." Journal of the ACM (JACM) 15.1 (1968): 8-36.

    19.Huang, Anna. "Similarity measures for text document clustering." Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand. 2008.

    20.張家榮,「以距離與餘弦夾角為基礎之創新群集方法研究」,台灣科技大學營建工程學系營建管理組碩士論文,2014。

    21.趙衛君,「應用高斯過程建立分階式山區道路邊坡崩塌預測模式之研究-以阿里山公路為例」,碩士論文,國立臺灣科技大學營建工程系,2004。

    22.李鈞宇,「應用高斯過程建立新中橫公路邊坡崩塌預測模式之研究」,碩士論文,國立臺灣科技大學營建工程系,2006。

    23.張閔嘉,「智慧型節能技術:以感測網路自動偵測異常空調狀態之研究」,國立台灣大學碩士論文,2011。

    24.Chawla, N. V., K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer. 之"SMOTE: Synthetic Minority Over-sampling Technique." Journal of Artificial Intelligence Research

    QR CODE