基於K-mer圖像特徵生成影像資料擴增｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	張皓崴 Hao-Wei Chang
論文名稱：	基於K-mer圖像特徵生成影像資料擴增 K-mer-based Pattern Generation for Image Data Augmentation
指導教授：	林柏廷 Po-Ting Lin
口試委員:	吳育瑋 Yu-Wei Wu 林其禹 Chyi-Yen Lin 張敬源 Ching-Yuan Chang
學位類別：	碩士 Master
系所名稱：	工程學院 - 機械工程系 Department of Mechanical Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	136
中文關鍵詞：	機械學習、人工智慧、自動化光學檢測、資料擴增、圖像辨識、K-mer圖像特徵、生成網路模型
外文關鍵詞：	generator model
相關次數：	點閱：219 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著機器學習及人工智慧的快速發展，機器視覺中關於數位影像處理的技術被廣泛運用於各項領域，在工業4.0的進展中，自動化光學檢測更是不可或缺的系統之一，透過電腦設備對各類影像進行分析與辨識，達到自動化無人生產流程。然而，要訓練出良好的影像辨識模型，需要提供大量的資料數據；惟於產線研發初期，不一定有足夠的影像資料，即使有，也需要花費人力與時間對影像進行標註和整理。因此，若以現有的資料集進行擴增，獲得足夠數量的資料提供給模型訓練，便能增加建立影像辨識模型的效率和其效能。對圖像進行辨識分類過程中，通常需要提取影像特徵再經由分類器進行分類，而K-mer圖像特徵即是一種能良好表現圖像形貌的特徵。本研究以此特徵為基礎，研發一套相對應的資料擴增方法，在特徵維度中進行擴增，再經由生成網路模型將擴增資料擴展至影像維度，除了增加其多變性，同時也能創造出更多特徵提供給辨識模型進行學習訓練。本研究亦將此K-mer圖像特徵生成影像擴增方法應用於EMNIST手寫資料集，並針對不同的資料集狀況進行擴增，將資料集各類別資料量平均擴增其數量，應用在較少數量之訓練資料集，能提升約2%的準確率；隨著訓練資料及數量增加，雖然數值相對較不明顯，依然能提升準確率。在各類別資料量不平均的情況下，因為部分類別資料量缺失，造成訓練模型分類準確率下降，透過本研究之方法將較少數量的類別資料補齊，提升其模型下降的準確率，甚至在擴增某些類別的情況下，訓練出的模型能優於未缺失數據資料所訓練之模型。另外，與條件對抗生成網路的擴增方法進行比較，本研究應用於多類別的資料集上，能生成出有效的影像資料。對較少類別的資料集進行擴增後，於各種數量的資料集皆能對所訓練出的分類模型大幅提升其準確率。

With the growth of machine learning and artificial intelligence, the technology of machine vision has been applied on different fields. Automated Optical Inspection is essential to the development of industry 4.0. With AOI system, machines can recognize different kinds of images with computers to achieve the goal of manufacturing automation. However, training a good model to do pattern recognition requires a large amount of data. To acquire data for training, it takes lots of resources of both human and time to get information and label the data.Therefore, it can improve the efficiency of training models by increasing the amount of data with dealing with the data that has already got. And this approach is called “data augmentataion”. In the process of pattern recognition, it extracts the features of pattern first, and classify them by a model of classifier. The K-mer-based patteren is one of the great features to represent the shape of pattern well. This study develops a method of data augmentation based on the features of K-mer-based pattern. This method augment the data from the dimension of features and expand them to the dimension of images with the model of neural network. It can not only increase the variability of the data, but also create more features for model to learn. This paper applies the method of K-mer-based pattern generation of image data augmentation on the EMNIST dataset under different conditions. Applying on the equal amount of each categories of data, it can improve the accuracy about 2% with smaller dataset. Though the result is not that outstanding, it can still enhance the ability of model training with larger dataset. Under the condition that the amount of certain categories is less, we apply our method of data augmentation to make each categories obtain the same amount of data. Comparing the model trained by the dataset before and after augmentation, the results show that the ability of the model can be imporved after applying our method on the training dataset. On the other hand, we also compare this method with augmenting data with conditional generative adversarial nets(cGAN). The method of our study generate effective iamges for model to train with the dataset contained 47 categories, and the model of cGAN generate the images with too much noise. The dataset of 10 categories applied the method K-mer-based pattern generation can impove more accuracy of the trained model more than applied with the method of cGAN.

摘要    I
ABSTRACT    III
致謝    V
目錄    VI
符號索引    X
圖表索引    XIII
第一章、序論    1
1 前言    1
2 動機    2
3 論文架構    4
第二章、研究理論介紹    6
1 圖像資料集    6
2 K-MER圖像特徵    8
2.1 K-mer頻率分類方法    8
2.2影像K-mer編碼    8
3 人工神經網路    10
3.1卷積神經網路    14
3.2自編碼器    17
4 影像資料擴增    18
4.1幾何變形    18
4.2圖像混合    19
4.3生成對抗網路    20
4.4條件生成對抗網路    21
4.5特徵資料擴增    22
第三章、研究方法    24
1 影像前處理    25
1.1 影像濾波    25
1.2 影像二值化    27
1.3幾何中心    28
1.4影像前處理流程    29
2 K-MER圖像特徵應用    31
2.1 角度特徵連接方式    32
2.2 均值提取特徵方式    33
2.3 特徵資料擴增方式    34
2.4 逆向影像生成    39
3 人工神經網路模型    40
3.1 全連接層分類器    41
3.2 K-mer逆向生成模型    42
3.3卷積神經網路模型    43
4 分類評價指標    44
5 實驗流程規劃    46
第四章、實驗結果    50
1 參數設定實驗    50
1.1 K-mer角度特徵連結    50
1.2 影像資訊與均值遮罩    51
1.3 特徵擴增方法比較    53
2 K-mer圖像特徵資料擴增    55
2.1 類別數量平均擴增實驗    55
2.2 類別數量不平均擴增實驗    59
3 與條件對抗生成網路比較    67
第五章、結論與未來展望    70
1 結論    70
2 未來展望    71
參考文獻    73
附錄A    79
附錄B    85
附錄C    88

                                

[1] Sonka, M., Hlavac, V., & Boyle, R. (2014). "Image processing, analysis, and machine vision." Cengage Learning.
[2] Russell, S., & Norvig, P. (2002). "Artificial intelligence: a modern approach."
[3] Azar, A. T., & Vaidyanathan, S. (Eds.). (2015). "Computational intelligence applications in modeling and control." Switzerland, Europe: Springer International Publishing.
[4] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11), 2278-2324.
[5] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems, 25, 1097-1105.
[6] Simonyan, K., & Zisserman, A. (2014). "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556.
[7] He, K., Zhang, X., Ren, S., & Sun, J. (2016). "Deep residual learning for image recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[8] Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). "Densely connected convolutional networks." In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).
[9] Newell, A. J., & Griffin, L. D. (2011, September). "Multiscale histogram of oriented gradient descriptors for robust character recognition." In 2011 International conference on document analysis and recognition (pp. 1085-1089). IEEE.
[10] Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns." IEEE Transactions on pattern analysis and machine intelligence, 24(7), 971-987.
[11] 姚佑達（2020）。"基於二階段多保真最佳化之智慧影像辨識方法"。國立臺灣科技大學機械工程系碩士論文，台北市。
[12] Cohen, G., Afshar, S., Tapson, J., & Van Schaik, A. (2017, May). "EMNIST: Extending MNIST to handwritten letters." In 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 2921-2926). IEEE.
[13] Grother, P. J. (1995). "NIST special database 19."Handprinted forms and characters database, National Institute of Standards and Technology, 10.
[14] Zhou, F., Olman, V., & Xu, Y. (2008). Barcodes for genomes and applications. BMC bioinformatics, 9(1), 1-11.
[15] Wu, Y. W., Tang, Y. H., Tringe, S. G., Simmons, B. A., & Singer, S. W. (2014). "MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm." Microbiome, 2(1), 1-18.
[16] Wu, Y. W., Simmons, B. A., & Singer, S. W. (2016). "MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets." Bioinformatics, 32(4), 605-607.
[17] 陸韋豪, 林書平, 林柏廷, & 吳育瑋(2018.9.27-28). "人工智慧影像辨識系統之開發及應用. " 第19屆非破壞檢測技術研討會, 台北, 台灣.
[18] Yu-Ta Yao, Yu-Hsiang Chen, Yu-Wei Wu, Kuang-Yen Liu, Ching-Yuan Chang, and Po Ting Lin*.(2019). "K-mer-based Pattern Recognition (KPR) for Infrastructure Crack Classification," presented at the 2019 Asian Pacific Congress on Computational Mechanics (APCOM 2019), Taipei, Taiwan, 2019, Paper No. 0162.
[19] Yu-Ta Yao, Yu-Wei Wu, Po Ting Lin*. (2020.6.8-12). "K-mer-based Pattern Recognition (KPR) for the Keyboard Inspection," 20th World Congress on Non-Destructive Testing (WCNDT 2020), Seoul, Korea, Paper No. A20191001-0276.
[20] Yu-Ta Yao, Po Ting Lin*. (2020.5.24-28). "Multi-Fidelity Design Optimization for K-mer-based Pattern Recognition (KPR) of Handwritten Characters." Asian Congress of Structural and Multidisciplinary Optimization 2020 (ACSMO 2020), Paper number P00267, Seoul, Korea.
[21] Yu-Ta Yao, Yu-Wei Wu*, Po Ting Lin*.(2020, August.) "A two-stage multi-fidelity design optimization for K-mer-based pattern recognition (KPR) in image processing." In International Design Engineering Technical Conferences and Computers and Information in Engineering Conference (Vol. 84010, p. V11BT11A031). American Society of Mechanical Engineers.
[22] Freund, Y., & Schapire, R. E. (1999). "Large margin classification using the perceptron algorithm." Machine learning, 37(3), 277-296.
[23] Xu, B., Wang, N., Chen, T., & Li, M. (2015). "Empirical evaluation of rectified activations in convolutional network." arXiv preprint arXiv:1505.00853.
[24] Bridle, J. S. (1990). "Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters." In Advances in neural information processing systems (pp. 211-217).
[25] Baldi, P. (2012, June). "Autoencoders, unsupervised learning, and deep architectures." In Proceedings of ICML workshop on unsupervised and transfer learning (pp. 37-49). JMLR Workshop and Conference Proceedings.
[26] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research, 15(1), 1929-1958.
[27] Ioffe, S., & Szegedy, C. (2015, June). "Batch normalization: Accelerating deep network training by reducing internal covariate shift." In International conference on machine learning (pp. 448-456). PMLR.
[28] Shorten, C., & Khoshgoftaar, T. M. (2019). "A survey on image data augmentation for deep learning." Journal of Big Data, 6(1), 1-48.
[29] Moreno-Barea, F. J., Strazzera, F., Jerez, J. M., Urda, D., & Franco, L. (2018, November). "Forward noise adjustment scheme for data augmentation." In 2018 IEEE symposium series on computational intelligence (SSCI) (pp. 728-734). IEEE.
[30] Inoue, H. (2018). "Data augmentation by pairing samples for images classification." arXiv preprint arXiv:1801.02929.
[31] Summers, C., & Dinneen, M. J. (2019, January). "Improved mixed-example data augmentation." In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1262-1270). IEEE.
[32] Perez, L., & Wang, J. (2017). "The effectiveness of data augmentation in image classification using deep learning." arXiv preprint arXiv:1712.04621.
[33] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2020). "Generative adversarial networks." Communications of the ACM, 63(11), 139-144.
[34] Mirza, M., & Osindero, S. (2014). "Conditional generative adversarial nets." arXiv preprint arXiv:1411.1784.
[35] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). "SMOTE: synthetic minority over-sampling technique." Journal of artificial intelligence research, 16, 321-357.
[36] DeVries, T., & Taylor, G. W. (2017). "Dataset augmentation in feature space." arXiv preprint arXiv:1702.05538.
[37] Gross, R., & Brajovic, V. (2003, June). "An image preprocessing algorithm for illumination invariant face recognition." In International Conference on Audio-and Video-Based Biometric Person Authentication (pp. 10-18). Springer, Berlin, Heidelberg.
[38] Pietka, E., Gertych, A., Pospiech, S., Cao, F., Huang, H. K., & Gilsanz, V. (2001). "Computer-assisted bone age assessment: Image preprocessing and epiphyseal/metaphyseal ROI extraction." IEEE transactions on medical imaging, 20(8), 715-729.
[39] Haddad, R. A., & Akansu, A. N. (1991). "A class of fast Gaussian binomial filters for speech and image processing." IEEE Transactions on Signal Processing, 39(3), 723-727.
[40] Otsu, N. (1979). "A threshold selection method from gray-level histograms." IEEE transactions on systems, man, and cybernetics, 9(1), 62-66.
[41] Baldominos, A., Saez, Y., & Isasi, P. (2019). "A survey of handwritten character recognition with mnist and emnist." Applied Sciences, 9(15), 3169.
[42] Shawon, A., Rahman, M. J. U., Mahmud, F., & Zaman, M. A. (2018, September). "Bangla handwritten digit recognition using deep cnn for large and unbiased dataset." In 2018 International Conference on Bangla Speech and Language Processing (ICBSLP) (pp. 1-6). IEEE.

全文公開日期 2024/08/05 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文