研究生: |
黃凱麟 Kai-Lin Huang |
---|---|
論文名稱: |
應用 Bloom Filter 於基因序列比對之研究 Exploiting Bloom Filter for Solving the Gene Sequence Matching Problem |
指導教授: |
陳維美
Wei-Mei Chen |
口試委員: |
陳維美
Wei-Mei Chen 林昌鴻 Chang-Hong Lin 林淵翔 Yuan-Hsiang Lin 吳晉賢 Chin-Hsien Wu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 49 |
中文關鍵詞: | 基因定序 、平行演算法 |
外文關鍵詞: | Bloom Filter, DNA Sequence |
相關次數: | 點閱:150 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
提升在 DNA 測序 (DNA sequencing) 問題中判斷一個子字串 k-mer 是否存在於一個 DNA 資料集當中的效率,可以大幅度提升生物資訊學 (bioinformatics) 相關演算法的效能。本論文提出 GBFd 演算法將資料壓縮存儲至多維陣列所建立出的一種變形 Bloom Filter,與 Standard Bloom Filter(SBF) 相比,降低運算時間以及有較少的偽陽性率 (false positive rate)。GBFd 實驗結果顯示,可以有效提升演算法執行效率,加速比對之運算時間。
Improving the efficiency of determining whether a k-mer exists in a DNA sequencing
problem can significantly improve the efficiency of bioinformatics related algorithms. Inthis paper, GBFd parallel algorithm is proposed to compress and store data into multidimensional array to build a variant Bloom Filter. Compared with the Standard Bloom Filter(SBF), it reduces the execution time and has lower false positive rate. The experimental results of GBFd show that it can effectively improve the execution efficiency and speed up the execution time on GPU.
[1] B. H. Bloom, “Space/time trade-offs in hash coding with allowable errors,” Communications of the ACM, vol. 13, no. 7, pp. 422–426, 1970.
[2] R. Patgiri, S. Nayak, and S. K. Borgohain, “rdbf: A r-dimensional bloom filter for
massive scale membership query,” Journal of Network and Computer Applications,
vol. 136, pp. 100–113, 2019.
[3] R. Rozov, R. Shamir, and E. Halperin, “Fast lossless compression via cascading
bloom filters,” in BMC bioinformatics, vol. 15, pp. 1–8, BioMed Central, 2014.
[4] K. Malde and B. O’Sullivan, “Using bloom filters for large scale gene sequence
analysis in haskell,” in Practical Aspects of Declarative Languages (A. Gill and
T. Swift, eds.), (Berlin, Heidelberg), pp. 183–194, Springer Berlin Heidelberg, 2009.
[5] H. Stranneheim, M. Käller, T. Allander, B. Andersson, L. Arvestad, and J. Lundeberg, “Classification of dna sequences using bloom filters,” Bioinformatics, vol. 26,no. 13, pp. 1595–1600, 2010.
[6] D. Pellow, D. Filippova, and C. Kingsford, “Improving bloom filter performance
on sequence data using k-mer bloom filters,” Journal of Computational Biology,
vol. 24, no. 6, pp. 547–557, 2017.
[7] V. Nygaard, A. Løland, M. Holden, M. Langaas, H. Rue, F. Liu, O. Myklebost,
Ø. Fodstad, E. Hovig, and B. Smith-Sørensen, “Effects of mrna amplification on
gene expression ratios in cdna experiments estimated by analysis of variance,” Bmc
Genomics, vol. 4, no. 1, pp. 1–13, 2003.
[8] J. R. Miller, S. Koren, and G. Sutton, “Assembly algorithms for next-generation
sequencing data,” Genomics, vol. 95, no. 6, pp. 315–327, 2010.
[9] P. Pandey, M. A. Bender, R. Johnson, and R. Patro, “Squeakr: an exact and approximate k-mer counting system,” Bioinformatics, vol. 34, pp. 568–575, 10 2017.
[10] A.-A. Mamun, S. Pal, and S. Rajasekaran, “KCMBT: a k-mer Counter based on
Multiple Burst Trees,” Bioinformatics, vol. 32, pp. 2783–2790, 06 2016.
[11] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, “Bigtable: A distributed storage system for structured data,” ACM Trans. Comput. Syst., vol. 26, jun 2008.
[12] A. Lakshman and P. Malik, “Cassandra: A decentralized structured storage system,”
SIGOPS Oper. Syst. Rev., vol. 44, p. 35–40, apr 2010.
[13] J. H. Mun and H. Lim, “New approach for efficient ip address lookup using a bloom
filter in trie-based algorithms,” IEEE Transactions on Computers, vol. 65, no. 5,
pp. 1558–1565, 2016.
[14] M. Antikainen, T. Aura, and M. Särelä, “Denial-of-service attacks in bloomfilter-based forwarding,” IEEE/ACM Transactions on Networking, vol. 22, no. 5,
pp. 1463–1476, 2014.
[15] S. Geravand and M. Ahmadi, “Bloom filter applications in network security: A stateof-the-art survey,” Computer Networks, vol. 57, no. 18, pp. 4047–4064, 2013.
[16] A. Kirsch and M. Mitzenmacher, “Less hashing, same performance: Building a better bloom filter,” Random Structures & Algorithms, vol. 33, no. 2, pp. 187–218,
2008.
[17] L. Fan, P. Cao, J. Almeida, and A. Z. Broder, “Summary cache: a scalable wide-area
web cache sharing protocol,” IEEE/ACM transactions on networking, vol. 8, no. 3,
pp. 281–293, 2000.
[18] B. Fan, D. G. Andersen, M. Kaminsky, and M. D. Mitzenmacher, “Cuckoo filter: Practically better than bloom,” in Proceedings of the 10th ACM International
on Conference on emerging Networking Experiments and Technologies, pp. 75–88,
2014.
[19] P. S. Almeida, C. Baquero, N. Preguiça, and D. Hutchison, “Scalable bloom filters,” Information Processing Letters, vol. 101, no. 6, pp. 255–261, 2007.
[20] T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis, “The case for learned index structures,” in Proceedings of the 2018 international conference on management
of data, pp. 489–504, 2018.
[21] F. Xu, P. Liu, J. Xu, J. Yang, and S.-M. Yiu, “Multi-dimensional bloom filter: design and evaluation,” IEICE TRANSACTIONS on Information and Systems, vol. 100,
no. 10, pp. 2368–2372, 2017.
[22] A. Appleby, “Murmurhash.” https://sites.google.com/site/murmurhash/,
2022.
[23] M. J. Flynn, “Some computer organizations and their effectiveness,” IEEE Transactions on Computers, vol. C-21, no. 9, pp. 948–960, 1972.
[24] NVIDIA, “CUDA Toolkit Documentation.” https://docs.nvidia.com/cuda/
index.html, 2022.
[25] F. Lin, G. Wang, J. Zhou, S. Zhang, and X. Yao, “High-performance ipv6 address
lookup in gpu-accelerated software routers,” Journal of Network and Computer Applications, vol. 74, pp. 1–10, 2016.
[26] S. Xiong, Y. Yao, M. Berry, H. Qi, and Q. Cao, “Frequent traffic flow identification through probabilistic bloom filter and its gpu-based acceleration,” Journal of Network and Computer Applications, vol. 87, pp. 60–72, 2017.
[27] C.-L. Hung, C.-Y. Lin, and P.-C. Wu, “An efficient gpu-based multiple pattern
matching algorithm for packet filtering,” J. Signal Process. Syst., vol. 86, p. 347–
358, mar 2017.
[28] R. Bhat, R. K. Thilak, and R. P. Vaibhav, “Hunting the pertinency of hash and bloom filter combinations on gpu for fast pattern matching,” International Journal of Information Technology, pp. 1–13, 2022.