研究生: 蕭如民
Ju-Min Hsiao
論文名稱: 基於點擊注意圖的類神經資訊檢索模型
Click-Attention Graph for Neural Information Retrieval
指導教授: 徐俊傑
Chiun-Chieh Hsu
口試委員: 賴源正
Yuan-Cheng Lai
Yue-Li Wang
學位類別: 碩士
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 48
中文關鍵詞: 資訊檢索檢索點擊紀錄點擊圖類神經網路圖神經網路注意力機制
外文關鍵詞: Information Retrieval, Click Log, Click Graph, Neural Network, Graph Neural Network, Attention Mechanism
隨著類神經網路相關理論與技術的蓬勃發展,使其在許多資訊相關領域都佔據了相當重要的地位,資訊檢索領域也包含其中。但若要訓練出一個擁有良好表現的類神經網路模型,如何獲取大量的經過標注的訓練資料往往是一道難以避開的難題,其往往需要花費大量成本才能取得,並不便於實際環境中的應用。因此許多基於類神經網路的資訊檢索(Neural IR)相關研究將目光投射至使用者在檢索系統中留下來的檢索點擊紀錄,以此種便於取得、常被視為使用者的隱性相關回饋的資料來訓練模型。
目前已有許多Neural IR的相關研究藉由檢索點擊紀錄訓練模型,但他們往往只是將此種資料當作一種容易取得的標注訓練資料的替代品,忽略了許多先前研究從中發現的豐富資訊,同時也沒有注意其中所隱藏的雜訊與資料稀疏等問題。
本研究提出了一種新的基於點擊注意圖的類神經資訊檢索模型CAGNIR (Click-Attention Graph for Neural Information Retrieval),利用圖神經網路讓查詢詞與文件在點擊圖中從周遭鄰居聚合相關資訊,使自身能藉由過往的點擊關係獲得更豐富、更完整的表徵向量,從而降低查詢詞與文件之間常有的語意落差(semantic gap),並減緩檢索點擊紀錄中資料稀疏問題的影響;同時CAGNIR在聚合的過程中也運用了多視角的注意力機制,讓各節點能從多種面向來衡量自身與鄰居的相關程度,並動態地評估各面向在該次聚合中的重要性。最後,本研究也透過實驗來衡量CAGNIR的表現,並藉由將CAGNIR與相關模型所評估的相關程度結果視覺化來比較其間的差異,驗證CAGNIR確實能藉由點擊注意圖得出更完善的相關程度評估方法,使曾與查詢詞有點擊關係、或在點擊圖中距離較近的文件能有較好、較合理的排序。

With the vigorous development of neural network related theories and technologies, it has occupied a very important position in many information domains, and the domain of information retrieval is no exception. However, when it comes to training a neural network with good performance, how to get a large amount of labeled training data is often one of the unavoidable problems, obtaining such data usually cost a lot, and sometimes not feasible in practice. Therefore, many researchers of neural network based information retrieval (Neural IR) focus on the click logs left by users in the retrieval systems to train models, which is easy to obtain and often seen as a kind of implicit relevance feedback from users.
There has been much research on Neural IR use click logs to train models, but they often just take this kind of data as a substitute which is much easier to obtain, ignoring the rich information discovered by much previous research, as well as the problems of noise and sparsity hidden in the click logs.
This research proposes a new Neural IR model CAGNIR (Click-Attention Graph for Neural Information Retrieval), which uses graph neural network to make queries and documents aggregate relevant information from their neighbors in the click graph, enable them to obtain richer and more complete representation via the click relationships, thereby reducing the semantic gap that often occurs between queries and documents, and alleviating the sparsity in the click logs. At the same time, CAGNIR also uses a multi-view attention mechanism in the aggregation process, so that each node can measure its relevance to its neighbors from multiple perspectives and dynamically evaluate the importance of each perspective in the aggregation. Finally, this research also measures the performance of CAGNIR through experiments, then visualizes the relevance scores measured by CAGNIR and other related models to analyze the differences between them, verifying that CAGNIR does learn a more complete method to measure relevance with Click-Attention Graph, so that documents which are clicked on for the query or are close with it in the click graph will have a better and more reasonable ranking.

