簡易檢索 / 詳目顯示

研究生: 陳秋宜
Chiu-Yi Chen
論文名稱: 利用社群互動模式在資料不完整的人際網路裡偵測利益衝突
Conflict of Interest Detection in Incomplete Collaboration Network via Social Interaction
指導教授: 李漢銘
Hahn-Ming Lee
口試委員: 何建明
Jan-Ming Ho
莊庭瑞
Tyng-Ruey Chuang
林豐澤
Feng-Tse Lin
李育杰
Yuh-Jye Lee
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 60
中文關鍵詞: 人際網路遺失資料利益衝突預測連結合作網路
外文關鍵詞: social network, missing data, conflict of interest, link prediction, collaboration network
相關次數: 點閱:190下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

利益衝突是一種情況,在這情況下一個人具備某種條件,在對他人做決策時會以本身存有偏見做決定。利益衝突的偵測被廣泛的運用在個各領域在學術界裡面,本文則想在一般的學術活動層面來探討利益衝突的表現形式。在利益衝突偵測上,文件的蒐集與探勘是很重要的一環。很不幸的,大部分的學者都忽略從網路上蒐集回來的資料大都不完整,其原因歸於人為疏失、個人隱私保護的問題,最重要的是我們不能保證網際網路裡面所蒐集到的資料是最完整的。基於這些原因,使得所有利益關係的人無法在所建立的學術協同合作網路偵測出來,這對於某些需要比較嚴謹的利益衝突偵測系統,遺失的衝突關係會導致出現一些不被預期的結果。
本文強調在資料不完整的環境裡利用社群互動模式偵測出有利益衝突關係的研究。在學術合作的環境裡,我們觀察到(1)大部分的人喜歡與權位較高的學者合作在相同或是相似的研究領域;(2)人們經常透過朋友的朋友互相結識進而互相合作。經由所觀察到的人際互動關係型態,我們運用在偵測出遺失的衝突關係。一開始從現有的資料建出協同合作網路,再利用我們所觀察到的特殊網路形態推論出被遺失的關係。實驗結果顯示,我們所提出的方法最高可以有效的恢復95%利益衝突關係。此外我們的方法可以利用單一的資料來源就能找出大部分的關係,取代不同資料來源的繁瑣整合工作。


Conflict Of Interest (COI) detection for social contacts is a fundamental data mining
task to build a collaboration network. Unfortunately social network data are often
incomplete. Moreover, due to privacy issues, it is impractical to access several information
sources. In applications that require strict COI, unrevealed relationships caused
by incomplete data may be problematic. For such applications, a missing-relationship
recovering algorithm that possesses high recall rates is needed. In this paper, we study
the problem of recovering missing relationships underlying social interaction in academic
community. From the topology of collected social network, we observe that 1)
people like to cooperate with authoritative researchers in the same or similar domains;
2)people having friendships or friend of friendships more likely cooperates together
if they share the same research interests. Based on the observations, the potential relationships
are detected by discovering hidden social interaction among disconnected
nodes that have specific types of connectivity. Our experiment results show that our
approach can recover up to 95% relationships in COI detection, which means we can
explore most of missing relationships from a single source.

ABSTRACT i ACKNOWLEDGEMENTS ii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Proposed Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 Contribution/Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.6 Outlines of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Background 10 2.1 Conflict of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 The Boundary Specification Problem . . . . . . . . . . . . . 12 2.3 Social Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Degree Centrality . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Betweenness Centrality . . . . . . . . . . . . . . . . . . . . . 15 2.3.3 Closeness Centrality . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Link Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3 Missing Relationship Finder 19 3.1 Notation Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1 Insufficient Network Building Stage . . . . . . . . . . . . . . 21 3.2.2 Missing Relationships Finding Stage . . . . . . . . . . . . . 23 3.3 Overview the proposed system . . . . . . . . . . . . . . . . . . . . . 28 4 Experiments 30 4.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Data Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 Experimental Methodology and Metric . . . . . . . . . . . . . . . . . 33 4.3.1 Missing Edge in Complete Graph . . . . . . . . . . . . . . . 34 4.3.2 k-associated Path Detection . . . . . . . . . . . . . . . . . . 36 4.3.3 Relationship Constraints . . . . . . . . . . . . . . . . . . . . 37 4.4 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . 37 4.5 Characteristic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 39 5 Conclusion and FurtherWork 45 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

[1] S. F. Adafre and M. de Rijke, “Discovering missing links in wikipedia,” in
LinkKDD ’05: Proceedings of the 3rd international workshop on Link discovery,
Chicago, Illinois, August 2005, pp. 90–97.
[2] R. Agrawal and R. Srikant, “Privacy-preserving data mining,” ACM SIGMOD
Record, vol. 29, no. 2, pp. 439–450, August 2000.
[3] M. A′ ngeles Serrano, A. Maguitman, F. S. Maria′n Bogu na′, and A. Vespignani, Decoding the structure of the www: A comparative analysis of web crawls,” ACM Transactions on the Web, vol. 1, no. 2, pp. 10–35, August 2007.
[4] R. Bekkerman and A. McCallum, “Disambiguating web appearances of people in
a social network,” inWWW’05: Proceedings of the 14th international conference
on World Wide Web, Chiba, Japan, May 2005, pp. 463–470.
[5] H. Bernard, P. Killworth, D. Kronenfeld, and L. Sailer, “The problem of informant accuracy: the validity of retrospective data,” Annual Review of Anthropology, vol. 13, pp. 495–517, 1984.
[6] H. Biswas and M. Hasan, “Using publications and domain knowledge to build
research profiles: An application in automatic reviewer assignment,” in ICICT’07: International Conference on Information and Communication Technology,
Dhaka, Bangladesh, March 2007, pp. 82–86.
[7] A. M. Boanerges, N. Meenakshi, D. Li, S. Amit, A. I. Budak, J. Anupam, and
F. Tim, “Scalable semantic analytics on social networks for addressing the problem of conflict of interest detection,” ACM Trans. Web, vol. 2, no. 1, pp. 1–29, February 2008.
[8] S. P. Borgatti, K. M. Carley, and D. Krackhardt, “On the robustness of centrality measures under conditions of imperfect data,” Social Networks, vol. 28, no. 2, pp. 124–136, May 2006.
[9] D. Brewer and C. Webster, “Forgetting of friends and its effects on measuring friendship networks,” Social Networks, vol. 21, pp. 361–373, June 1999.
[10] R. S. Burt, “A note on missing social network data in the general social survey,” Social Networks, vol. 19, pp. 355–373, October 1987.
[11] C. T. Butts, “Network inference, error, and informant (in) accuracy: a bayesian approach,” Social Networks, vol. 25, no. 2, pp. 103–140, May 2003.
[12] D. Cameron, B. Aleman-Meza, and I. Arpinar, “Collecting expertise of researchers for finding relevant experts in a peer-review setting,” in EFW ’07:1st International ExpertFinder Workshop, Berlin, Germany, January 2007.
[13] R. Chellappa and A. Jain, Markov random fields: theory and application.
Boston: Academic Press, 1993.
[14] C. C. Chen, K. H. Yang, and J. M. Ho, “Bibpro: A citation parser based on
sequence alignment techniques,” in AINA ’08: roceedings of the IEEE 22nd International Conference on Advanced Information Networking and Applications,
Okinawa, Japan, March 2008, pp. 1175–1180.
[15] The dblp computer science bibliography. [Online]. Available: http://www.
informatik.uni-trier.de/»ley/db/
[16] H. Ebel, L. Mielsch, and S. Bornholdt, “Scale-free topology of e-mail networks,”Physical Review E, vol. 66, p. 035103, September 2002.
[17] E. Elmacioglu and D. Lee, “On six degrees of separation in dblp-db and more,”ACM SIGMOD Record, vol. 34, no. 2, pp. 33–40, June 2005.
[18] L. C. Freeman, “A set of measures of centrality based on betweenness,” Sociometry, vol. 40, no. 1, pp. 35–41, March 1977.
[19] L. C. Freeman, “Centrality in social networks conceptual clarification,” Social Networks, vol. 1, no. 3, pp. 215–239, 1978.
[20] L. Getoor, “Link mining: a new data mining challenge,” ACM SIGKDD Explorations Newsletter, vol. 5, no. 1, pp. 84–89, December 2003.
[21] M. S. Granovetter, “The strength of weak ties,” The American Journal of Sociology, vol. 78, no. 6, pp. 1360–1380, 1973.
[22] R. Guimera, L. Danon, A. Diaz-Guilera, F. Giralt, and A. Arenas, “Self-similar community structure in organisations,” Physical Review E, vol. 68, p. 065103, November 2003.
[23] J. Hollywood, D. Snyder, K. McKay, and J. Boon, Out of the Ordinary: Finding Hidden Threats by Analyzing Unusual Behavior. RAND Corporation, January
2004.
[24] P. Holme, C. Edling, and F. Liljeros, “Structure and time-evolution of an internet dating community,” Social Networks, vol. 26, pp. 155–174, May 2004.
[25] Z. Huan, “Link prediction based on graph topology: The predictive value of the generalized clustering coefficient,” in LinkKDD’06: Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia,
Pennsylvania, USA, August 2006.
[26] T. H. Huang and M. L. Huang, “Analysis and visualization of co-authorship networks for understanding academic collaboration and knowledge domain of individual researchers,” in CGIV ’06: Proceedings of the International Conference on Computer Graphics, Imaging and Visualisation, 2006, pp. 18–23.
[27] Z. Huang, X. Li, and H. Chen, “Link prediction approach to collaborative filtering,”in JCDL ’05: Proceedings of the 5th ACM/IEEE-CS joint conference on
Digital libraries, Denver, CO, USA, June 2005, pp. 141–142.
[28] H. Hui, G. Lee, Z. Hongyuan, L. Cheng, and T. Kostas, “Two supervised learning approaches for name disambiguation in author citations,” in JCDL ’04: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, Tuscon, AZ, USA, June 2004, pp. 296–305.
[29] Y. Jin, Y. Matsuo, and M. Ishizuka, “Extracting social networks among various entities on the web,” in ESWC ’07: Proceedings of the 4th European conference on The Semantic Web, Innsbruck, Austria, June 2007, pp. 251–266.
[30] G. Kossinets, “Effects of missing data in social networks,” Social Networks, vol. 28, no. 3, pp. 247–268, July 2006.
[31] E. O. Laumann, P. V. Marsden, and D. Prensky, “The boundary specification
problem in network analysis,” Applied network analysis: A methodological introduction, pp. 18–34, 1983.
[32] T. Laz, K. Fisher, M. Kostich, and A. M., “Connecting the dots,” Modern Drug Discovery, no. 12, pp. 33–36, December 2004.
[33] D. Liben-Nowell and J. Kleinberg, “The link prediction problem for social networks,” in CIKM ’03: Proceedings of the twelfth international conference on
Information and knowledge management, New Orleans, LA, USA, November 2003, pp. 556–559.
[34] X. Liu, J. Bollen, M. L. Nelson, and H. V. de Sompel, “Co-authorship networks in the digital library research community,” Information Processing and Management: an International Journal, vol. 41, no. 6, pp. 1462–1480, December 2005.
[35] P. Marsden, “Network data and measurement,” Annual Review of Sociology,
vol. 16, pp. 435–463, 1990.
[36] N. Matunda and O. Sylvia, “The role graph model and conflict of interest,” ACM Transactions on Information and System Security (TISSEC), vol. 2, no. 1, pp. 3–33, February 1999.
[37] P. Mutschke and A. Q. Haase, “Collaboration and cognitive structures in social science research fields. towards socio-cognitive analysis in information systems,” Scientometrics, vol. 52, no. 3, pp. 487–502, November 2001.
[38] P. Mutschke, “Mining networks and central entities in digital libraries. a graph theoretic approach applied to co-author networks,” in IDA ’03: The 5th International Symposium on Intelligent Data Analysis, vol. 2810, Berlin, Germany, Augest 2003, pp. 155–166.
[39] M. A. Nascimento, J. Sander, and J. Pound, “Analysis of sigmod’s co-authorship graph,” ACM SIGMOD Record, vol. 32, no. 3, pp. 8–10, September 2003.
[40] M. E. J. Newman, “The structure of scientific collaboration networks.” Proc Natl Acad Sci USA, vol. 98, no. 2, pp. 404–409, January 2001.
[41] M. Newman, “Assortative mixing in networks,” Physical Review Letters, vol. 89, p. 208701, May 2002.
[42] J. O’Madadhain, J. Hutchins, and P. Smyth, “Prediction and ranking algorithms for event-based network data,” ACM SIGKDD Explorations Newsletter, vol. 7, no. 2, pp. 23–30, December 2005.
[43] A. Popescul and L. H. Ungar, “Statistical relational learning for link prediction,” in Proc. of the Workshop on Learning Statistical Models from Relational Data at IJCAI-2003, Acapulco, Mexico, August 2003.
[44] M. J. Rattigan and D. Jensen, “The case for anomalous link discovery,” ACM SIGKDD Explorations Newsletter, vol. 7, no. 2, pp. 41–47, December 2005.
[45] G. Robins, P. Pattison, and J. Woolcock, “Missing data in networks: exponential random graph (p*) models for networks with non-respondents,” Social Networks, vol. 26, pp. 257–283, July 2004.
[46] M. A. Rodriguez and J. Bollen, “An algorithm to determine peer-reviewers,” in CIKM ’08: Proceeding of the 17th ACM conference on Information and knowledge management, Napa Valley, California, USA, October 2008, pp. 319–328.
[47] Social network. [Online]. Available: http://en.wikipedia.org/wiki/Social network [48] Social network analysis, a brief introduction. [Online]. Available: http://www.orgnet.com/sna.html
[49] D. Stork and W. Richards, “Nonrespondents in communication network studies: problems and possibilities,” Group and Organization Management, vol. 17, no. 2, pp. 193–209, 1992.
[50] C. Wang, V. Satuluri, and S. Parthasarathy, “Local probabilistic models for link prediction,” in ICDM ’07: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, Omaha NE, USA, October 2007, pp. 322–331.
[51] S. Wasserman and J. Galaskiewicz, Advances in social network analysis: Research in the social and behavioral sciences. Sage Publications, 1994.
[52] M. Yutaka, M. Junichiro, H. Masahiro, I. Keisuke, N. Takuichi, T. Hideaki,
H. Koiti, , and I. Mitsuru, “Polyphonet: an advanced social network extraction
system from the web,” in WWW ’06: Proceedings of the 15th international conference on World Wide Web, Edinburgh, Scotland, May 2006, pp. 397–406.

QR CODE