運用論文引用關係與論文標題分析建構基於概念之學術文獻搜尋方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃正鵬 Jheng-Peng Huang
論文名稱：	運用論文引用關係與論文標題分析建構基於概念之學術文獻搜尋方法 Knowledge Inference through Contextual Analysis of Titles in Citation Networks
指導教授：	項天瑞 Tien-Ruey Hsiang
口試委員:	楊傳凱 Chuan-Kai Yang 鮑興國 Hsing-Kuo Pao
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2015
畢業學年度：	103
語文別：	英文
論文頁數：	42
中文關鍵詞：	概念搜尋、學術文獻綜述、論文網路分析
外文關鍵詞：	conceptual retrieval, literature review, citation network analysis
相關次數：	點閱：213 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

一般運用關鍵字以及內文配對搜尋的文獻檢索系統，往往無法提供足夠且符合文獻綜述 (literature review) 的需求之結果，造成學術文章搜尋及研究過程總是需要耗費大量的時間整理資訊。在研究初期，研究人員往往缺乏某領域的背景知識，因此易受限於使用錯誤的關鍵字搜尋，而找到不重要甚至不相關的文獻。而一字多意或是多字一意，以及相關背景知識的不足，是造成此現象的兩大主因。而多個文字的組合不但能解決字意混淆的問題，同時也能更清楚表達一個完整的概念，本論文使用概念以及概念之間的語意關係來重現學術文章，取代掉以往使用詞彙向量 (term-document vectors) 的處理方式。本論文提出一個運用概念搜尋文件的方法，用來解決字意模糊在詞彙或關鍵字配對搜尋方法中所造成的問題。並使用一個基於圖形的模型 (graph-based model) 來重現學術文章標題中所包含的概念，以及利用引用關係取得的相關的概念及研究。並利用圖形演算法以及引文網路分析，計算出一個分數來表示每個概念在各個文件中的相對重要性。因此，使用者在輸入一主題後，本方法會根據其相關的概念，以及計算過的每個概念在文件中的重要程度之分數，回傳與其搜尋主題之相關研究，藉此提供更完整的知識以利做進一步的學術研究。本論文將所提出之方法，實作在 CiteSeerX 之論文資料庫，總共約有 360萬筆學術論文。我們使用不同領域的主題搜尋，發現其回傳結果皆能回傳與主題相關的文章，並同時能揭露其中所包含的技術、理論等知識，以及這些相關概念之發展趨勢。

The literature review process is often time-consuming because the keyword or text matching-based search is unable to provide sufficient knowledge. Specifically, while conducting a scientific study in the early stage, the keyword-based information retrieval often suffers from the improper use of search terms provided by the user. Two primary causes of mis-used keywords are the insufficient information and the word ambiguity. Since the composition of words may eliminate the ambiguity and generate a more accurate concept, this paper adopts concepts and their semantic relations instead of term-document vectors to better represent a scientific article. This paper proposes a concept retrieval method to handle the synonym and polysemy of search terms and keyword mismatches. A graph-based model is used to represent the concepts contained in the titles of scientific articles and the corresponding citation relations. Scores of concepts are derived accordingly in order to estimate the significance of a concept relative to a certain article. For a given query, related articles are retrieved and evaluated according to the importance of the induced concepts, so that users may continue their literature review in a more informative fashion. We have implemented a prototype system based on the CiteSeerX data, which contains over 3.6 million article records. We have evaluated queries from different research domains and showed that the proposed method successfully extracts articles with related concepts that are not limited by the exact keywords specified in a query.

論文指導教授推薦書................................ i
考試委員審定書 .................................. ii
摘要......................................... iii
Abstract....................................... iv
Acknowledgments.................................. v
Content....................................... vi
List of Tables ................................... viii
List of Figures ................................... ix
1 Introduction................................... 1
2 Background................................... 3
2.1 CitationNetworkAnalysis........................ 3
 2.2 DocumentRepresentation ........................ 3
 2.3 ScoringFunction ............................. 4
 2.4 Graph-basedRetrieval .......................... 5
3 Method ..................................... 7
 3.1 ConceptGraph .............................. 7
 3.2 ConceptScores .............................. 11
 3.3 ConceptualRetrieval........................... 13
4 Implementation................................. 15
5 Experiment ................................... 18
 5.1 DatasetEvaluation............................ 18
 5.2 ScoringFunctionEvaluation....................... 18
 5.3 EvaluationMethods ........................... 20
  5.3.1 Data................................ 22
 5.4 Results................................... 23 6
Conclusion.................................... 27
References...................................... 28

                                

[1] K. Sugiyama and M.-Y. Kan, “Exploiting potential citation papers in schol- arly paper recommendation,” in Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. ACM, 2013, pp. 153–162.
[2] J. K. Chou and C.-K. Yang, “Papervis: Literature review made easy,” in Pro- ceedings of the 13th Eurographics / IEEE - VGTC Conference on Visualization, ser. EuroVis’11. Eurographics Association, 2011, pp. 721–730.
[3] S. A. Greenberg et al., “How citation distortions create unfounded authority: analysis of a citation network,” Bmj, vol. 339, 2009.
[4] T. Chakraborty, S. Sikdar, V. Tammana, N. Ganguly, and A. Mukherjee, “Computer science fields as ground-truth communities: Their impact, rise and fall,” in IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2013. IEEE, 2013, pp. 426–433.
[5] S. Dawson, D. Gašević, G. Siemens, and S. Joksimovic, “Current state and future trends: a citation network analysis of the learning analytics field,” in Proceedins of the Fourth International Conference on Learning Analytics And Knowledge. ACM, 2014, pp. 231–240.
[6] O. Egozi, S. Markovitch, and E. Gabrilovich, “Concept-based information re- trieval using explicit semantic analysis,” ACM Transactions on Information Systems, vol. 29, no. 2, pp. 8:1–8:34, April 2011.
[7] J.-y. Hong, E.-h. Suh, and S.-J. Kim, “Context-aware systems: A literature review and classification,” Expert Systems with Applications, vol. 36, no. 4, pp. 8509–8522, 2009.
[8] Y. Kajikawa, J. Ohno, Y. Takeda, K. Matsushima, and H. Komiyama, “Creat- ing an academic landscape of sustainability science: an analysis of the citation network,” Sustainability Science, vol. 2, no. 2, pp. 221–231, 2007.
[9] F. Rousseau and M. Vazirgiannis, “Graph-of-word and TW-IDF: New approach to ad hoc ir,” in Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, ser. CIKM ’13, 2013, pp. 59–68.
[10] C. J. van Rijsbergen, “A theoretical basis for the use of co-occurrence data in information retrieval,” Journal of documentation, vol. 33, no. 2, pp. 106–119, 1977.
[11] R. Blanco and C. Lioma, “Graph-based term weighting for information re- trieval,” Information Retrieval, vol. 15, no. 1, pp. 54–92, Feb. 2012.
[12] G. Erkan and D. R. Radev, “Lexrank: Graph-based lexical centrality as salience in text summarization,” Journal of Artificial Intelligence Research, vol. 22, no. 1, pp. 457–479, Dec. 2004.
[13] R. Mihalcea and P. Tarau, “Textrank: Bringing order into texts,” in Proceedings of Empirical Methods in Natural Language Processing 2004, D. Lin and D. Wu,
Eds. Barcelona, Spain: Association for Computational Linguistics, July 2004, pp. 404–411.
[14] G. Giannakopoulos, V. Karkaletsis, G. Vouros, and P. Stamatopoulos, “Sum- marization system evaluation revisited: N-gram graphs,” ACM Transactions on Speech and Language Processing (TSLP), vol. 5, no. 3, p. 5, 2008.
[15] K.-J. Kim and S.-B. Cho, “A personalized web search engine using fuzzy con- cept network with link structure,” in IFSA World Congress and 20th NAFIPS International Conference, 2001. Joint 9th, vol. 1. IEEE, 2001, pp. 81–86.
[16] R. F. i Cancho, R. V. Solé, and R. Köhler, “Patterns in syntactic dependency networks,” Physical Review E, vol. 69, no. 5, p. 051915, 2004.
[17] A. E. Motter, A. P. de Moura, Y.-C. Lai, and P. Dasgupta, “Topology of the conceptual network of language,” Physical Review E, vol. 65, no. 6, p. 065102, 2002.
[18] R. F. i Cancho and R. V. Solé, “The small world of human language,” Proceed- ings of the Royal Society of London. Series B: Biological Sciences, vol. 268, no. 1482, pp. 2261–2265, 2001.
[19] O. Jespersen, The philosophy of grammar. University of Chicago Press, 1992.
[20] J. Liu, J. Wang, and C. Wang, “A text network representation model,” in Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 2008. FSKD ’08., vol. 4, Oct 2008, pp. 150–154.
[21] A. Singhal, J. Choi, D. Hindle, D. Lewis, and F. Pereira, “At&t at TREC-7,” in Proceedings of the 7th Text REtrieval Conference. TREC-7, 1999, pp. 239–252.
[22] A. Singhal, C. Buckley, and M. Mitra, “Pivoted document length normaliza- tion,” in Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’96. ACM, 1996, pp. 21–29.
[23] S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford et al., “Okapi at trec-3,” NIST SPECIAL PUBLICATION SP, pp. 109–126, 1995.
[24] S. E. Robertson and K. Sparck Jones, “Document retrieval systems,” P. Wil- lett, Ed. London, UK, UK: Taylor Graham Publishing, 1988, ch. Relevance Weighting of Search Terms, pp. 143–160.
[25] R. Blanco and C. Lioma, “Random walk term weighting for information re- trieval,” in Proceedings of the 30th Annual International ACM SIGIR Confer- ence on Research and Development in Information Retrieval, ser. SIGIR ’07. New York, NY, USA: ACM, 2007, pp. 829–830.
[26] L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation rank- ing: Bringing order to the web.” Stanford InfoLab, Technical Report 1999-66, November 1999.
[27] F. Bonchi, R. Perego, F. Silvestri, H. Vahabi, and R. Venturini, “Efficient query recommendations in the long tail via center-piece subgraphs,” in Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’12. New York, NY, USA: ACM, 2012, pp. 345–354.
[28] C.-R. Huang and L.-H. Lee, “Contrastive approach towards text source classi- fication based on top-bag-of-word similarity.” in PACLIC, 2008, pp. 404–410.
[29] J. Wang, Y. Li, Y. Zhang, H. Xie, and C. Wang, “Boosted learning of visual word weighting factors for bag-of-features based medical image retrieval,” in Sixth International Conference on Image and Graphics (ICIG), 2011. IEEE, 2011, pp. 1035–1040.
[30] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, “Feature-rich part-of- speech tagging with a cyclic dependency network,” in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computa- tional Linguistics on Human Language Technology - Volume 1, ser. NAACL’03. Stroudsburg, PA, USA: Association for Computational Linguistics, 2003, pp. 173–180.
[31] K. Toutanova and C. D. Manning, “Enriching the knowledge sources used in a maximum entropy part-of-speech tagger,” in Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13, ser. EMNLP ’00. Stroudsburg, PA, USA: Association for Computational Linguistics, 2000, pp. 63–70.
[32] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. Mc- Closky, “The Stanford CoreNLP natural language processing toolkit,” in Pro- ceedings of 52nd Annual Meeting of the Association for Computational Linguis- tics: System Demonstrations, 2014, pp. 55–60.
[33] M.-C. de Marneffe and C. D. Manning, “The stanford typed dependencies rep- resentation,” in Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, ser. CrossParser ’08. Stroudsburg, PA, USA: Association for Computational Linguistics, 2008, pp. 1–8.
[34] H. Small, Co-citation in the scientific literature: A new measure of the rela- tionship between two documents. Wiley Subscription Services, Inc., A Wiley Company, 1973, vol. 24.
[35] C. L. Giles, K. D. Bollacker, and S. Lawrence, “Citeseer: An automatic cita- tion indexing system,” in Proceedings of the Third ACM Conference on Digital Libraries, ser. DL ’98. ACM, 1998, pp. 89–98.
[36] X. Liu, J. Zhang, and C. Guo, “Full-text citation analysis: A new method to enhance scholarly networks,” Journal of the American Society for Information Science and Technology, vol. 64, no. 9, pp. 1852–1863, 2013.

全文公開日期 2020/07/28 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文