研究生: |
張皓崴 Hao-Wei Chang |
---|---|
論文名稱: |
基於K-mer圖像特徵生成影像資料擴增 K-mer-based Pattern Generation for Image Data Augmentation |
指導教授: |
林柏廷
Po-Ting Lin |
口試委員: |
吳育瑋
Yu-Wei Wu 林其禹 Chyi-Yen Lin 張敬源 Ching-Yuan Chang |
學位類別: |
碩士 Master |
系所名稱: |
工程學院 - 機械工程系 Department of Mechanical Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 136 |
中文關鍵詞: | 機械學習 、人工智慧 、自動化光學檢測 、資料擴增 、圖像辨識 、K-mer圖像特徵 、生成網路模型 |
外文關鍵詞: | generator model |
相關次數: | 點閱:278 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著機器學習及人工智慧的快速發展,機器視覺中關於數位影像處理的技術被廣泛運用於各項領域,在工業4.0的進展中,自動化光學檢測更是不可或缺的系統之一,透過電腦設備對各類影像進行分析與辨識,達到自動化無人生產流程。然而,要訓練出良好的影像辨識模型,需要提供大量的資料數據;惟於產線研發初期,不一定有足夠的影像資料,即使有,也需要花費人力與時間對影像進行標註和整理。因此,若以現有的資料集進行擴增,獲得足夠數量的資料提供給模型訓練,便能增加建立影像辨識模型的效率和其效能。對圖像進行辨識分類過程中,通常需要提取影像特徵再經由分類器進行分類,而K-mer圖像特徵即是一種能良好表現圖像形貌的特徵。本研究以此特徵為基礎,研發一套相對應的資料擴增方法,在特徵維度中進行擴增,再經由生成網路模型將擴增資料擴展至影像維度,除了增加其多變性,同時也能創造出更多特徵提供給辨識模型進行學習訓練。本研究亦將此K-mer圖像特徵生成影像擴增方法應用於EMNIST手寫資料集,並針對不同的資料集狀況進行擴增,將資料集各類別資料量平均擴增其數量,應用在較少數量之訓練資料集,能提升約2%的準確率;隨著訓練資料及數量增加,雖然數值相對較不明顯,依然能提升準確率。在各類別資料量不平均的情況下,因為部分類別資料量缺失,造成訓練模型分類準確率下降,透過本研究之方法將較少數量的類別資料補齊,提升其模型下降的準確率,甚至在擴增某些類別的情況下,訓練出的模型能優於未缺失數據資料所訓練之模型。另外,與條件對抗生成網路的擴增方法進行比較,本研究應用於多類別的資料集上,能生成出有效的影像資料。對較少類別的資料集進行擴增後,於各種數量的資料集皆能對所訓練出的分類模型大幅提升其準確率。
With the growth of machine learning and artificial intelligence, the technology of machine vision has been applied on different fields. Automated Optical Inspection is essential to the development of industry 4.0. With AOI system, machines can recognize different kinds of images with computers to achieve the goal of manufacturing automation. However, training a good model to do pattern recognition requires a large amount of data. To acquire data for training, it takes lots of resources of both human and time to get information and label the data.Therefore, it can improve the efficiency of training models by increasing the amount of data with dealing with the data that has already got. And this approach is called “data augmentataion”. In the process of pattern recognition, it extracts the features of pattern first, and classify them by a model of classifier. The K-mer-based patteren is one of the great features to represent the shape of pattern well. This study develops a method of data augmentation based on the features of K-mer-based pattern. This method augment the data from the dimension of features and expand them to the dimension of images with the model of neural network. It can not only increase the variability of the data, but also create more features for model to learn. This paper applies the method of K-mer-based pattern generation of image data augmentation on the EMNIST dataset under different conditions. Applying on the equal amount of each categories of data, it can improve the accuracy about 2% with smaller dataset. Though the result is not that outstanding, it can still enhance the ability of model training with larger dataset. Under the condition that the amount of certain categories is less, we apply our method of data augmentation to make each categories obtain the same amount of data. Comparing the model trained by the dataset before and after augmentation, the results show that the ability of the model can be imporved after applying our method on the training dataset. On the other hand, we also compare this method with augmenting data with conditional generative adversarial nets(cGAN). The method of our study generate effective iamges for model to train with the dataset contained 47 categories, and the model of cGAN generate the images with too much noise. The dataset of 10 categories applied the method K-mer-based pattern generation can impove more accuracy of the trained model more than applied with the method of cGAN.
[1] Sonka, M., Hlavac, V., & Boyle, R. (2014). "Image processing, analysis, and machine vision." Cengage Learning.
[2] Russell, S., & Norvig, P. (2002). "Artificial intelligence: a modern approach."
[3] Azar, A. T., & Vaidyanathan, S. (Eds.). (2015). "Computational intelligence applications in modeling and control." Switzerland, Europe: Springer International Publishing.
[4] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11), 2278-2324.
[5] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems, 25, 1097-1105.
[6] Simonyan, K., & Zisserman, A. (2014). "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556.
[7] He, K., Zhang, X., Ren, S., & Sun, J. (2016). "Deep residual learning for image recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[8] Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). "Densely connected convolutional networks." In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).
[9] Newell, A. J., & Griffin, L. D. (2011, September). "Multiscale histogram of oriented gradient descriptors for robust character recognition." In 2011 International conference on document analysis and recognition (pp. 1085-1089). IEEE.
[10] Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns." IEEE Transactions on pattern analysis and machine intelligence, 24(7), 971-987.
[11] 姚佑達(2020)。"基於二階段多保真最佳化之智慧影像辨識方法"。國立臺灣科技大學機械工程系碩士論文,台北市。
[12] Cohen, G., Afshar, S., Tapson, J., & Van Schaik, A. (2017, May). "EMNIST: Extending MNIST to handwritten letters." In 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 2921-2926). IEEE.
[13] Grother, P. J. (1995). "NIST special database 19."Handprinted forms and characters database, National Institute of Standards and Technology, 10.
[14] Zhou, F., Olman, V., & Xu, Y. (2008). Barcodes for genomes and applications. BMC bioinformatics, 9(1), 1-11.
[15] Wu, Y. W., Tang, Y. H., Tringe, S. G., Simmons, B. A., & Singer, S. W. (2014). "MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm." Microbiome, 2(1), 1-18.
[16] Wu, Y. W., Simmons, B. A., & Singer, S. W. (2016). "MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets." Bioinformatics, 32(4), 605-607.
[17] 陸韋豪, 林書平, 林柏廷, & 吳育瑋(2018.9.27-28). "人工智慧影像辨識系統之開發及應用. " 第19屆非破壞檢測技術研討會, 台北, 台灣.
[18] Yu-Ta Yao, Yu-Hsiang Chen, Yu-Wei Wu, Kuang-Yen Liu, Ching-Yuan Chang, and Po Ting Lin*.(2019). "K-mer-based Pattern Recognition (KPR) for Infrastructure Crack Classification," presented at the 2019 Asian Pacific Congress on Computational Mechanics (APCOM 2019), Taipei, Taiwan, 2019, Paper No. 0162.
[19] Yu-Ta Yao, Yu-Wei Wu, Po Ting Lin*. (2020.6.8-12). "K-mer-based Pattern Recognition (KPR) for the Keyboard Inspection," 20th World Congress on Non-Destructive Testing (WCNDT 2020), Seoul, Korea, Paper No. A20191001-0276.
[20] Yu-Ta Yao, Po Ting Lin*. (2020.5.24-28). "Multi-Fidelity Design Optimization for K-mer-based Pattern Recognition (KPR) of Handwritten Characters." Asian Congress of Structural and Multidisciplinary Optimization 2020 (ACSMO 2020), Paper number P00267, Seoul, Korea.
[21] Yu-Ta Yao, Yu-Wei Wu*, Po Ting Lin*.(2020, August.) "A two-stage multi-fidelity design optimization for K-mer-based pattern recognition (KPR) in image processing." In International Design Engineering Technical Conferences and Computers and Information in Engineering Conference (Vol. 84010, p. V11BT11A031). American Society of Mechanical Engineers.
[22] Freund, Y., & Schapire, R. E. (1999). "Large margin classification using the perceptron algorithm." Machine learning, 37(3), 277-296.
[23] Xu, B., Wang, N., Chen, T., & Li, M. (2015). "Empirical evaluation of rectified activations in convolutional network." arXiv preprint arXiv:1505.00853.
[24] Bridle, J. S. (1990). "Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters." In Advances in neural information processing systems (pp. 211-217).
[25] Baldi, P. (2012, June). "Autoencoders, unsupervised learning, and deep architectures." In Proceedings of ICML workshop on unsupervised and transfer learning (pp. 37-49). JMLR Workshop and Conference Proceedings.
[26] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research, 15(1), 1929-1958.
[27] Ioffe, S., & Szegedy, C. (2015, June). "Batch normalization: Accelerating deep network training by reducing internal covariate shift." In International conference on machine learning (pp. 448-456). PMLR.
[28] Shorten, C., & Khoshgoftaar, T. M. (2019). "A survey on image data augmentation for deep learning." Journal of Big Data, 6(1), 1-48.
[29] Moreno-Barea, F. J., Strazzera, F., Jerez, J. M., Urda, D., & Franco, L. (2018, November). "Forward noise adjustment scheme for data augmentation." In 2018 IEEE symposium series on computational intelligence (SSCI) (pp. 728-734). IEEE.
[30] Inoue, H. (2018). "Data augmentation by pairing samples for images classification." arXiv preprint arXiv:1801.02929.
[31] Summers, C., & Dinneen, M. J. (2019, January). "Improved mixed-example data augmentation." In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1262-1270). IEEE.
[32] Perez, L., & Wang, J. (2017). "The effectiveness of data augmentation in image classification using deep learning." arXiv preprint arXiv:1712.04621.
[33] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2020). "Generative adversarial networks." Communications of the ACM, 63(11), 139-144.
[34] Mirza, M., & Osindero, S. (2014). "Conditional generative adversarial nets." arXiv preprint arXiv:1411.1784.
[35] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). "SMOTE: synthetic minority over-sampling technique." Journal of artificial intelligence research, 16, 321-357.
[36] DeVries, T., & Taylor, G. W. (2017). "Dataset augmentation in feature space." arXiv preprint arXiv:1702.05538.
[37] Gross, R., & Brajovic, V. (2003, June). "An image preprocessing algorithm for illumination invariant face recognition." In International Conference on Audio-and Video-Based Biometric Person Authentication (pp. 10-18). Springer, Berlin, Heidelberg.
[38] Pietka, E., Gertych, A., Pospiech, S., Cao, F., Huang, H. K., & Gilsanz, V. (2001). "Computer-assisted bone age assessment: Image preprocessing and epiphyseal/metaphyseal ROI extraction." IEEE transactions on medical imaging, 20(8), 715-729.
[39] Haddad, R. A., & Akansu, A. N. (1991). "A class of fast Gaussian binomial filters for speech and image processing." IEEE Transactions on Signal Processing, 39(3), 723-727.
[40] Otsu, N. (1979). "A threshold selection method from gray-level histograms." IEEE transactions on systems, man, and cybernetics, 9(1), 62-66.
[41] Baldominos, A., Saez, Y., & Isasi, P. (2019). "A survey of handwritten character recognition with mnist and emnist." Applied Sciences, 9(15), 3169.
[42] Shawon, A., Rahman, M. J. U., Mahmud, F., & Zaman, M. A. (2018, September). "Bangla handwritten digit recognition using deep cnn for large and unbiased dataset." In 2018 International Conference on Bangla Speech and Language Processing (ICBSLP) (pp. 1-6). IEEE.