研究生: |
周秉逸 Bing-Yi Chou |
---|---|
論文名稱: |
非結構化之網路探勘技術:由醫院看診進度之歷史資料推估到診區間 Non-structured Web Mining: Recommendation of Attending Intervals through Empirical Statistics Distribution |
指導教授: |
鍾聖倫
Sheng-Luen Chung |
口試委員: |
李育杰
none 吳怡樂 none 陳俊良 none 林希偉 none |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 中文 |
論文頁數: | 58 |
中文關鍵詞: | 網頁內容探勘技術 、排隊系統 、實證統計方法 、關聯式資料庫 |
外文關鍵詞: | web mining, queuing system, empirical statistical distribution, relational database design |
相關次數: | 點閱:318 下載:7 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
一般實務發生的排隊服務系統 (queuing system) 的統計特性很難精確塑模。透過巨量數據的長時間、大規模的歷史數據搜集,我們可以對排隊系統進行實證統計 (empirical statistics) 的分析,並在這基礎上針對服務與等待時間的估測。本研究的目的是,根據醫院網頁上所提供的各診間看診進度,進行歷史資料的備存,而由實證統計推估某特定診間某掛號的看診時間,以提供掛號病患應診的建議時間區間,減少病患候診不必要的等待時間。為了實證統計某醫師診間的看診時間特性,本論文採用的方法是累計相對應診間每班次完整的各號看診數據。據此,我們透過關聯式資料的設計方法,由實體關係圖 (ER diagram) 的分析,建構相應的關聯表格與資料庫。為處理非結構性的網頁資料,我們利用網路爬文技術,萃取內嵌在各醫院網頁的看診進度以匯入相應的表格。最後再利用實證統計的計算方法推導出各掛診號的看診時間區間分佈,以作為到診區間建議。本論文所實現的醫院到診系統,從2016年初即開始運作。我們分別針對國泰醫院、台大醫院、以及榮民總醫院的看診進度進行累計的實證統計。搭配圖形化顯示界面,使用者可以得到反應各診間看診特性的到診時間區間建議、線上看診進度比對、以及建議到診時間與實際看診時間的預測評估。本論文所介紹的非結構化網頁探勘技術亦可應用於其他排隊服務系統的實證統計特性塑模。
Statistics characteristics of real queueing systems are in general difficult to model. Exploiting open data harvested from webs over a long period of time provides a feasible approach to characterize some queueing systems through the concept of empirical statistics distribution. This paper is to make recommendation of attending intervals for hospital visitors based on visit progress commonly available through hospital web sites, so that unnecessary waiting time can be saved. We assume that, clinic interviewing time between physician and patients is idiosyncratic to each physician. In particular, the visiting time distribution of a particular visiting registration number should follow some unknown distribution pattern, which however can be approximated by empirical distribution when a large number of samples are available. Accordingly, ER-diagram analysis is conducted, crawlers prepared, deciphers implemented to constantly extract visiting progress reports from three main hospitals in Taiwan for the preparation of empirical distributions functions for all physicians on all clinic shifts. The implemented Recommendation System of Attending Intervals (RSAI) has been collecting visit progresses of the three hospitals since the beginning of 2016. From the implemented GUI, users can view each physician’s pattern of visiting time distribution, from which to make attending interval recommendation, to check visiting progress against our prediction range in real-time, and to evaluate our recommendation. The web mining techniques thus developed can be generalized to other queueing systems when statistic characteristics are desired.
[1] 林惠淳, "預測病患到診時間之研究-以心臟內科為例," 中山醫學大學醫學研究所學位論文, pp. 1-64, 2010.
[2] 廖婉君, 林鴻儒, 陳依兌,和 鍾睿弘, "病人候診與看診時間之研究-以北部某健檢中心為例," 環境與管理研究期刊2014.
[3] M. Beyer, “Gartner says solving big data challenge involves more than just managing volumes of data. Gartner,”2011), 2011..
[4] 劉玉敏,網站使用探勘(Web Usage Mining)概說[online].
http://goo.gl/4o3q6i
[5] O. Etzioni, “The World-Wide Web: quagmire or gold mine? ”Communications of the ACM, vol. 39, pp. 65-68, 1996.
[6] R. Cooley, B. Mobasher, and J. Srivastava, “Web mining: Information and pattern discovery on the world wide web,” in Tools with Artificial Intelligence, 1997. Proceedings., Ninth IEEE International Conference on, pp. 558-567, 1997.
[7] R. Kosala and H. Blockeel, “Web mining research: A survey,”ACM Sigkdd Explorations Newsletter, vol. 2, pp. 1-15, 2000.
[8] 陳孟豪,“ㄧ個針對XML,網頁特性的資料探勘架構”. 碩士論文,靜宜大學資訊管理學系碩士班, 2001.
[9] T. Guan and K.-F. Wong, “KPS: a Web information mining algorithm,” Computer Networks, vol. 31, pp. 1495-1507, 1999.
[10] B. Liu, Web data mining: exploring hyperlinks, contents, and usage data: Springer Science & Business Media, 2007.
[11] H. Varian, “Bootstrap tutorial,”Mathematica Journal, vol. 9, pp. 768-775, 2005.
[12] R. J. Hyndman and Y. Fan, “Sample quantiles in statistical packages,” The American Statistician,” vol. 50, pp. 361-365, 1996.
[13] P. C. Mahalanobis, “On the generalized distance in statistics,” Proceedings of the National Institute of Sciences (Calcutta), vol. 2, pp. 49-55, 1936.
[14] Wikipedia.(2016). “Empirical distribution function,” [Online]. Available:https://en.wikipedia.org/wiki/Empirical_distribution_function#cite_ref-vdv265_1-2
[15] R. Durrett, Probability: theory and examples: Cambridge university press, 2010.
[16] D. L. Massart and A. J. Smeyers-verbeke, “PRACTICAL DATA HANDLING Visual Presentation of Data by Means of Box Plots,” 2005.
[17] The Data Visualisation Catalogue. “Box and Whisker Plot,” [Online]. Available: http://www.datavizcatalogue.com/methods/box_plot.html
[18] Wikipedia.(2016). "Box plot,"[Online].
Available: https://en.wikipedia.org/wiki/Box_plot
[19] D. Burns, Selenium 2 Testing Tools: Beginner's Guide: Packt Publishing Ltd, 2012..
[20] 行政院主計處電子處理資料中心中文全字庫——中文碼介紹-BIG-5碼介紹[Online].
http://www.cns11643.gov.tw/AIDB/encodings.do#encode4