研究生: |
姚佑達 YU-TA YAO |
---|---|
論文名稱: |
基於二階段多保真最佳化之智慧影像辨識方法 Method of Intelligent Pattern Recognition Based on Two-Stepped Multi-Fidelity Optimization |
指導教授: |
林柏廷
Po-Ting Lin |
口試委員: |
吳育瑋
Yu-Wei Wu 林其禹 Chyi-Yen Lin |
學位類別: |
碩士 Master |
系所名稱: |
工程學院 - 機械工程系 Department of Mechanical Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 145 |
中文關鍵詞: | 自動光學檢測 、機械學習 、人工智慧 、頻率 、神經網路 、多保真度優化 、分類器 、圖像辨識 |
外文關鍵詞: | automatic optical inspection, mechanical learning, artificial intelligence, frequency, neural network, multi-fidelity optimization, classifier, pattern recognition |
相關次數: | 點閱:327 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著工業4.0、智慧工廠、智慧製造、機械視覺、人工智慧等等先進領域的蓬勃發展,無人且自動化產線成了現今的趨勢,而圖像辨識更是其各個先進領域共同的需求,因此本論文根據生命科學領域中DNA序列的K-mer二維編碼分析方法,提出一套全新的二維圖像K-mer頻率編碼方式,其為一項高適應性且智慧化的『人工智慧影像辨識演算法』。根據2D圖像中形狀特徵的出現機率,建立圖像形貌辨識的分類編碼,並加入各種分類器,建立圖像的特徵數據庫,進而辨識圖像形貌。我們先使用知名的MNIST數字手寫圖像庫進行測試,實驗結果發現,本論文提出的K-mer二維編碼方法,在未使用複雜的類神經網路人工學習架構下,僅採納一套有系統的圖像形貌出現機率之編碼方式便能夠達到約90%的圖像辨識正確率。建立在數字辨識的良好成果後,我們挑戰使用EMNIST英文手寫字庫中的Balanced,隨機挑選不同數量的英文字母數據集進行測試,在兼具時間與精準度的條件下進行參數最佳化,而此最佳化方法本論文提出採用兩階段的多保真度設計優化,搭配基因演算法(GA),利用大數據集與小數據集的來回切換計算精準度,在10000張手寫英文數據集中得到了87%的圖像辨識正確率,更在60000張手寫英文數據集得到了90%的圖像辨識正確率。之後也與目前常用的方法Histogram of Oriented Gradient(HOG)方向梯度特徵和Local binary patterns(LBP)局部二值化直方圖特徵做性能比較,得到精準度比其兩者相對高且速度也相對快的結果。同時也基於不同大小的數據集和Convolutional Neural Networks(CNN)做性能比較,也得到了精準度比其相對高的結果。本論文所提出之方法不受到圖像尺度或其他幾何參數的影響,具有良好的圖像形貌編碼能力。我們相信此具有適應性的圖像K-mer頻率編碼方式,能夠廣泛地被用於許多不同的影像分析及特徵辨識之應用中。
With the flourishing development of advanced fields such as Industry 4.0, Smart Factory, Smart Manufacturing, Machine Vision, Artificial Intelligence etc., unmanned and automation production line has become the current trends. And the technology of pattern recognition plays an important role among these fields. Based on the K-mer frequency analysis method of DNA sequences in the field of life sciences, this paper proposes a new set and highly adaptive of two-dimensional array that K-mer frequency coding methods, and it is an "Artificial Intelligence Image Recognition Algorithm". According to the appearance probability of shape features in 2D images, we establish classification codes for image shape recognition and add it into various classifiers. First, we use the MNIST database of handwritten digits for testing. The experimental results show that the K-mer frequency coding method proposed in this paper only use a set of systematic coding methods with appearance probability of image appearance without a complex neural network-like artificial learning architecture. However, it can reach about 90% of the accuracy of image recognition. With the good results of digital recognition, we challenged EMNIST English handwriting Balanced library. We randomly chose different amount of English alphabet for testing and then optimized the parameter for time and accuracy. In this paper, the method of optimization that we proposed is two-stepped multi-fidelity optimization with genetic algorithm. Switching big and small dataset to calculate the accuracy. It achieved an accuracy of 87% for image recognition with 10,000 images of English alphabet and 90% for 60,000 images. Furthermore, the performance is compared with the ordinary methods, HOG directional gradient feature and LBP feature, and its accuracy is higher than both methods and it is also faster. The performance compared to CNN based on different size of datasets, and it also has obtained relatively good results. The method proposed in this paper is not affected by image scale or other geometric parameters, and has good image shape coding ability. We believe that the adaptive image K-mer frequency encoding method can be widely applied in different kinds of image analysis and feature recognition.
[1] A. J. Newell and L. D. Griffin, "Multiscale histogram of oriented gradient descriptors for robust character recognition," in 2011 International Conference on Document Analysis and Recognition, 2011, pp. 1085-1089.
[2] T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns," 2002 ,pp. 971-987.
[3] M. Sonka, V. Hlavac, and R. Boyle, Image processing, analysis, and machine vision. Cengage Learning, 2014.
[4] A. T. Azar and S. Vaidyanathan, Computational intelligence applications in modeling and control. Springer, 2015.
[5] S. S. Rautaray and A. Agrawal, "Vision based hand gesture recognition for human computer interaction: a survey," 2015, pp. 1-54.
[6] S. J. Russell and P. Norvig, Artificial intelligence: a modern approach. Malaysia; Pearson Education Limited, 2016.
[7] K. Fukushima, "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position," 1980, pp. 193-202.
[8] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," 1998, pp. 2278-2324.
[9] G. Cohen, S. Afshar, J. Tapson, and A. van Schaik, "EMNIST: an extension of MNIST to handwritten letters (2017)."
[10] NIST. (2017). NIST Special Database 19. Available: https://www.nist.gov/srd/nist-special-database-19
[11] R. Gross and V. Brajovic, "An image preprocessing algorithm for illumination invariant face recognition," in International Conference on Audio-and Video-Based Biometric Person Authentication, 2003, pp. 10-18.
[12] B. O. May, "Image preprocessing by modified median filter," ed: Google Patents, 1988.
[13] E. Pietka, A. Gertych, S. Pospiech, F. Cao, H. Huang, and V. Gilsanz, "Computer-assisted bone age assessment: Image preprocessing and epiphyseal/metaphyseal ROI extraction," 2001, pp. 715-729.
[14] R. A. Haddad and A. N. Akansu, "A class of fast Gaussian binomial filters for speech and image processing," 1991, pp. 723-727.
[15] T. Huang, G. Yang, and G. Tang, "A fast two-dimensional median filtering algorithm," 1979, pp. 13-18.
[16] N. Kanopoulos, N. Vasanthavada, and R. L. Baker, "Design of an image edge detection filter using the Sobel operator," 1988, pp. 358-367.
[17] A. R. Weeks, Fundamentals of electronic image processing. SPIE Optical Engineering Press, 1996.
[18] S. Kariin and C. Burge, "Dinucleotide relative abundance extremes: a genomic signature," 1995, pp. 283-290.
[19] S. Karlin, Z.-Y. Zhu, and K. D. J. P. o. t. N. A. o. S. Karlin, "The extended environment of mononuclear metal centers in protein structures," 1997, pp. 14225-14230.
[20] J. Mrazek and S. Karlin, "Detecting Alien Genes in Bacterial Genomes a,"1999, pp. 314-329.
[21] F. Zhou, V. Olman, and Y. Xu, "Barcodes for genomes and applications," 2008, p. 546.
[22] Y.-W. Wu, Y.-H. Tang, S. G. Tringe, B. A. Simmons, and S. W. Singer, "MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm," 2014, p. 26.
[23] Y.-W. Wu, B. A. Simmons, and S. W. Singer, "MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets," 2016, pp. 605-607.
[24] L. A. Hug et al., "A new view of the tree of life," 2016, p. 16048.
[25] D. H. Parks et al., "Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life," 2017, pp. 1533-1542.
[26] J.-S. Chen, A. Huertas, and G. Medioni, "Fast convolution with Laplacian-of-Gaussian masks," 1987, pp. 584-590.
[27] D. G. Lowe, "Object recognition from local scale-invariant features," in Proceedings of the seventh IEEE international conference on computer vision, 1999, pp. 1150-1157.
[28] J. Canny, "A computational approach to edge detection, Readings in computer vision: issues, problems, principles, and paradigms," 1987.
[29] N. Otsu, "A threshold selection method from gray-level histograms," 1979, pp. 62-66.
[30] Y.-T. Yao, Chen, Y.-H. , Wu,, L. Y.-W., C. K.-Y., C.Y., and P. T. Lin, "K-mer-based Pattern Recognition (KPR) for Infrastructure Crack Classification," 2019 Asian Pacific Congress on Computational Mechanics (APCOM 2019), Taipei, Taiwan., 2019.
[31] Y.-T. Yao, Y.-W. Wu, and P. T. Lin, "K-mer-based Pattern Recognition (KPR) for the Keyboard Inspection," 20th World Congress on Non-Destructive Testing (WCNDT 2020), Seoul, Korea, Paper No. A20191001-0276., 2020.
[32] 陸韋豪, 林書平, 林柏廷, and 吳育瑋, "人工智慧影像辨識系統之開發及應用," 2018.
[33] C. Cortes and V. Vapnik, "N, Support Vector Networks," 1995, pp. 273-295.
[34] L. Breiman, "Random forests," vol. 45, no. 1, pp. 5-32, 2001.
[35] N. J. A. i. t. k. Altman and n.-n. n. regression, "The American Statistician," 1992, pp. 175-185.
[36] S. S. Mor, S. Solanki, S. Gupta, S. Dhingra, M. Jain, and R. Saxena, "Handwritten Text Recognition: With Deep Learning and Android," 2019.
[37] P. Cavalin and L. Oliveira, "Confusion Matrix-Based Building of Hierarchical Classification," in Iberoamerican Congress on Pattern Recognition, 2018, pp. 271-278.
[38] A. Shawon, M. J.-U. Rahman, F. Mahmud, and M. A. Zaman, "Bangla Handwritten Digit Recognition Using Deep CNN for Large and Unbiased Dataset," in 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), 2018, pp. 1-6.
[39] Y. Peng and H. Yin, "Markov random field based convolutional neural networks for image classification," in International Conference on Intelligent Data Engineering and Automated Learning, 2017, pp. 387-396.
[40] S. Sabour, N. Frosst, and G. Hinton, "Dynamic routing between capsules. CoRR abs/1710.09829 (2017)," 2017.
[41] R. Chakraborty, C.-H. Yang, and B. C. Vemuri, "A mixture model for aggregation of multiple pre-trained weak classifiers," 2018.