簡易檢索 / 詳目顯示

研究生: 戴鈺展
Yu-Jhan Dai
論文名稱: 投資社群中個股推薦績效之自動驗證
Automatic Verification of Stock Recommendations in Investment Social Webs
指導教授: 鍾聖倫
Sheng-Luen Chung
口試委員: 李育杰
none
陳俊良
none
吳怡樂
none
陸敬互
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 43
中文關鍵詞: 巨量資料真實性預測驗證社群資料分析股市關聯式資料庫
外文關鍵詞: social webs, prediction validation
相關次數: 點閱:173下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 真實性(Veracity)是巨量資料在數量(Volume)、速度(Velocity)以及多樣(Variety)三個以V開頭的特質之上,另一個以V開頭的特性描述。某種程度上,真實性是真正決定巨量資料中價值的關鍵。投資社群網站中常見對個股的未來趨勢而作買進或賣出的推薦文章,有時伴隨針對同個股而有預測結果相互矛盾的推薦文章同時發生。文獻與實務上鮮少有針這些蘊涵未來性本質的推薦性文章就其最後的真實性作核實的研究。本研究針對投資性質社群中對於台灣個股作多與作空建議的貼文進行後續短、長期的績效核實。利用小卜卦(Oraclet)塑模中:前辭、命辭、占辭與驗辭的推薦核實流程,我們先分析反應推薦貼文的立卦、驗證與查詢功能所需的關係實體圖,並建置其相對應的關聯式資料庫,之後,經由爬文技術,擷取投資社群中各個股推薦貼文的預測要素,將原先非結構化的資料彙入關聯式資料庫,完成立卦動作。然後,在立卦後的適當時機,本系統可自動地由當天的日收盤股價,進行短、長期的投資效益的計算與核實。在執行速度的提升上,本系統透過雲端化將最耗時的兩個部分,即立卦和績效計算進行分散式運算,在三個節點的模擬環境中,有效降低系統運算時間到一半。當作自動驗證推薦貼文的績效,本研究所建置的社群預測性貼文之自動驗證系統,系統性地針對包括ptt stock板與鉅享網的外資評等板兩投資社群,自從2011年以來,所有約1萬筆的個股推薦貼文逐一進行驗證。本系統是近即時性的,每一個小時即自動針上述兩社群的新個股買進或放空的推薦貼文作立卦處理,並按短、長期的時序於立卦之後進行績效驗證。除了就特定個股推薦的績效–或預測之等效真實性–作核實外,也可針對同一個股作推薦的相同或是矛盾看法的預測作彙整,以及針對社群中所有作過推薦的鄉民或是外資的歷史推薦績效作彙整以利使用者查詢。


    Veracity is the last V words attributed to the nature of big data in addition to volume, velocity and variety; veracity may be the most critical factor weighing the value of data at hand. For example, there are numerous stock pick recommendation postings commonly seen in investment-oriented social webs. Bloggers or professionals alike make sell or short recommendations, most time even mingled with view-conflicting postings. However, there are few, if any, follow-ups to check their validity not only in practice, but also in research literature.This study aims to track and validate Taiwan stock recommendations from two major investment social webs of PTT-stock and cnYES-Foreign investment. In particular, we are to track short and long term yields performance of each recommendation. With reference to the Oraclet framework in verification of prediction document, this study is to extract from each recommendation three components of: preface, charge, and prognostication, which is called registration process,followed by a verification process to validate the associated performance. In doing so, ER-diagram analysis is performed and a relational database implemented accordingly to accommodate the process involved. Crawlers are designed to pull stock recommendations from the two social webs, followed by the registration and verification processes on each recommendation. Meanwhile, to speed up, the system is constructed in the framework of cloud computing, the two processes of registration and verification are run on Spark.For demonstration, the implemented Automatic Verification of Stock Recommendation (AVSR) systemkeeps registering and verifying all recommendation postings from the two investment social webs since the year of 2011; each and every of more than ten thousand recommendations have been verified.The system is almost real-time, recommendation postings are processed at every hour. Verification results and pending oraclets are aligned in time axis for easy reviews. Recommendations of similar or conflicting views are compiled in batch for comparison. Performance histories of all bloggers or professionals appearing in these webs are also tracked.

    中文摘要 I Abstract II 誌謝 IV 圖目錄 VII 表目錄 VIII 第一章 簡介 1 1.1 預測文件之樣式探勘與自動驗證技術 1 1.2 社群裏的預測 2 1.3 參考文獻 3 1.4 本文貢獻與架構 7 第二章 社群預測性貼文之特性及系統需求 9 2.1 預測文件及預測人的真實性如何衡量 9 2.2 社群貼文特性 10 2.3 Oraclet的生命週期示意 12 2.4 社群預測性貼文之自動驗證系統功能需求 14 第三章 即時性社群貼文自動驗證技術 16 3.1 系統需求之ER正規化 16 3.2 非結構化資料匯入關聯式資料庫 20 3.3 卜卦驗證 25 第四章 AVSR系統分散式運算 29 4.1 分散式運算方法 29 4.2 分散式運算結果 31 第五章 投資社群之貼文真實性驗證結果 34 5.1 分析標的:ptt社群、鉅亨網 34 5.2 系統評估指標 36 5.3 基於真實性結果的相關查詢 38 第六章 結論 42 參考文獻 44 【附錄A】 45

    [1] IBM - What is big data? [Online]
    Available:http://www01.ibm.com/software/data/bigdata/
    [2] 簡琮祐,“預測事件文件之樣式探勘與自動驗證技術”.碩士論文,台灣科技大學電機工程系碩士班,民國104年.
    [3] O. Etzioni, "The World-Wide Web: quagmire or gold mine?," Communications of the ACM, vol. 39, pp. 65-68, 1996.
    [4] K. Sharma, G. Shrivastava, and V. Kumar, "Web mining: Today and tomorrow," Electronics Computer Technology (ICECT), 2011 3rd International Conference on, vol. 1, pp. 399-403, 2011.
    [5] R. Cooley, B. Mobasher, and J. Srivastava, "Web mining: Information and pattern discovery on the world wide web," Tools with Artificial Intelligence, 1997. Proceedings., Ninth IEEE International Conference on, pp. 558-567, 1997.
    [6] R. Kosala and H. Blockeel, "Web mining research: A survey," ACM Sigkdd Explorations Newsletter, vol. 2, pp. 1-15, 2000.
    [7] T. Guan and K.-F. Wong, "KPS: a Web information mining algorithm," Computer Networks, vol. 31, pp. 1495-1507, 1999.
    [8] P. P.-S. Chen, "The entity-relationship model—toward a unified view of data," ACM Transactions on Database Systems (TODS), vol. 1, pp. 9-36, 1976.
    [9] R. Elmasri, Fundamentals of database systems: Pearson Education India, 2008.
    [10] J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, pp. 107-113, 2008.
    [11] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, "Spark: cluster computing with working sets," HotCloud, vol. 10, pp. 10-10, 2010.
    [12] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, et al., "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing," Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp. 2-2, 2012.
    [13] M. Bostock, V. Ogievetsky, and J. Heer, "D³ data-driven documents," IEEE transactions on visualization and computer graphics, vol. 17, pp. 2301-2309, 2011.
    [14] 國立故宮博物院. (2001). 中央研究院歷史語言研究所藏甲骨特展 [Online] Available:http://www.npm.gov.tw/exhbition/yin0701/intro.htm
    [15] Hadoop [Online] Available:http://hadoop.apache.org/
    [16] Spark 1.6.2 Documentation [Online]
    Available:http://spark.apache.org/docs/latest/cluster-overview.html
    [17] T. Fawcett, "An introduction to ROC analysis," Pattern Recogn. Lett., vol. 27, pp. 861-874, 2006.

    QR CODE