研究生: |
吳家榮 Chia-Jung Wu |
---|---|
論文名稱: |
在社群網路中應用文字探勘於交通事故分析之研究 A social network based text mining approach for traffic accident analysis |
指導教授: |
呂永和
Yung-Ho Leu |
口試委員: |
楊維寧
Wei-Ning Yang 陳雲岫 Yun-Shiow Chen |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 資訊管理系 Department of Information Management |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 中文 |
論文頁數: | 58 |
中文關鍵詞: | 社群網路 、YouTube 、交通肇事成因 、文字探勘 、資料探勘 |
外文關鍵詞: | Social Media, YouTube, Traffic Accident Analysis, Text Mining, Data mining |
相關次數: | 點閱:783 下載:38 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於現今社群媒體的快速發展,大眾接受資訊的方式已不再局限於電視或廣播的新聞報導,而可以從社群媒體上接收到更快速、更多樣的資訊。現在原為接收資訊的大眾,也能夠利用社群媒體來提供資訊。由於社群媒體上的資訊具有高度的能見度,社群媒體已成為企業、政府及各類組織不可忽視的資訊來源。本研究的主要目的是利用社群媒體上大眾對於行車紀錄器之影片的評論訊息,協助警方對交通事故之肇事成因的辨識;首先,本研究蒐集YouTube上的交通事故影片,將影片以人工方式區分為「人為因素肇事」、「車輛因素肇事」、「環境因素肇事」、「路況因素肇事」等四個肇事類別;然後根據社群媒體上使用者對交通事故影片的評論,建立「人為因素肇事」、「車輛因素肇事」、「環境因素肇事」、「路況因素肇事」等四個肇事主因詞庫;接著,將每一個影片的所有評論合併,針對每一部影片,計算出「人為因素計數」、「車輛因素計數」、「環境因素計數」、「路況因素計數」等四個特徵值。以此方式,建立訓練資料集;最後,使用隨機森林演算法,進行分類。由實驗結果顯示,本研究所提出的方法,總體分類準確率高達94%,而四個類別的F-Measure平均值為81%。
In contrast to a TV or a radio broadcast that provide information, the public can also provide information through a social medium. Due to its high visibility, variety and penetration, social media has become an important source of information to the public, enterprise and government. The aim of this thesis is to automatically classify a traffic accident into a specific category according to the comments on the YouTube video of this accident. To this end, we first collected the metadata and comments related to the videos of accidents from the YouTube. Then, we labeled each accident with a specific accident type by viewing the content of the video. The four accident types are "cause-by-man", "cause-by-car-condition", "cause-by-environment" and "cause-by-road-situation". Subsequently, we constructed a thesaurus for each type of accident according to all the comments belonging to the same accident type. Afterwards, we constructed four different attributes including "count-of-cause-by-man", "count-of-cause-by-car-condition", "count-of-cause-by-environment" and "count-of-cause-by-road-situation" for each accident by referencing the thesauruses. The four "count-of-" attributes together with another four attributes and the class label of accident constitute a training sample. The training samples of all of the accidents constitute the training dataset of the classification problem. Finally, we used the Random Forest algorithm in Weka to solve this classification problem. The experiment showed that the accuracy of classification of this problem was 94 percent by a 10-fold cross validation. Furthermore, the average F-Measure of the classification problem was 81 percent. With high performance measures, the proposed method offers an efficient way to help find the cause of a traffic accident.
[1] Vidushi Chaudhary & Ashish Sureka (2013). Contextual Feature Based One-Class Classifier Approach for Detecting Video Response Spam on YouTube. Vidushi Chaudhary Indraprastha Institute of Information Technology (IIIT), Delhi New Delhi, India.
[2] Nisha Aggarwal, Swati Agrawal, Ashish Sureka (2014). Mining YouTube Metadata for Detecting Privacy Invading Harassment and Misdemeanor Videos. Nisha Aggarwal, Swati Agrawal, Ashish Sureka Indraprastha Institute of Information Technology, Delhi (IIITD), India.
[3] The World’s 21 Most Important Social Media Sites and Apps in 2015 : http://www.socialmediatoday.com/social-networks/2015-04-13/worlds-21-most-important-social-media-sites-and-apps-2015
[4] H. He and E. A. Garcia (2009). Learning from Imbalanced Data, IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-1284.
[5] Haibo He (2009). Learning from Imbalanced Data : http://www.ele.uri.edu/faculty/he/research/ImbalancedLearning/ImbalancedLearning_lecturenotes.pdf
[6] Salton, G. and McGill, M. J. (1983). Introduction to modern information retrieval. ISBN 0-07-054484-0. New York, NY, USA : McGraw-Hill.
[7] Salton, G., Fox, E. A. and Wu, H. (1983). Extended Boolean information retrieval. Communication of ACM 26, pp. 1022–1036.
[8] Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management 24 (5), pp. 513–523.
[9] 黃昶斌 ( 民100年9月)。道路交通事故初步分析研判表之法律分析。100年道路交通安全與執法研討會。
[10] 黃維信與賴靜慧 ( 民93年9月)。肇事駕駛者交通事故與違規記錄關連性之探討。九十三年道路交通安全與執法國際研討會。
[11] 中華民國內政部警政署「道路交通事故原因、傷亡及車輛損壞(A1+A2類)」民國103年 : https://www.npa.gov.tw/NPAGip/wSite/public/Attachment/f1429178452831.pdf
[12] 臺灣臺北地方法院民事判決 96年度訴字第5022號 : http://jirs.judicial.gov.tw/FJUD/
[13] 臺灣士林地方法院民事判決 九十二年度訴字第三二六號 : http://jirs.judicial.gov.tw/FJUD/
[14] 臺灣臺北地方法院民事判決 99年度簡上字第336號 : http://jirs.judicial.gov.tw/FJUD/
[15] 臺灣臺北地方法院民事判決 95年度重訴字第1274號 : http://jirs.judicial.gov.tw/FJUD/