簡易檢索 / 詳目顯示

研究生: 林新翔
Sin-Siang Lin
論文名稱: 基於K-mer深度學習於旋轉圖像之影像辨識方法
Kmer-based Deep Learning for Pattern Recognition of Rotated Images
指導教授: 林柏廷
Po-Ting Lin
口試委員: 吳育瑋
Yu-Wei Wu
林其禹
Chyi-Yeu Lin
張敬源
Ching-Yuan Chang
學位類別: 碩士
Master
系所名稱: 工程學院 - 機械工程系
Department of Mechanical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 122
中文關鍵詞: 圖像辨識深度學習人工智慧類神經網路卷積類神經網路特徵融合
外文關鍵詞: pattern recognition, deep learning, machine learning, artificial intelligence, convolution neural network, feature fusion
相關次數: 點閱:254下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 自從LeNet 模型在1998 年發表後,圖像辨識逐漸興盛及成熟,應用的領域及需求也日益增加。近來,在圖像處理領域中開發了基於K 聚體的模式識別(KPR)的方法,從生物醫學領域DNA 序列的K-mer 二維編碼分析方法,衍生出的圖像 K-mer 頻率編碼方式。在 KPR 中,從圖像圖案的中心向其最大範圍的周邊採集了多個長度為 K 的採樣陣列。基於 K-mer 的採樣字符串的頻率被用來構建一個資料集,用於訓練和圖像識別。本文將圖像K-mer 頻率編碼方式做了優化,在取K-mer 值時加入不同遮罩進行卷積運算以增加模糊效果,從而將點採樣增強為每個採樣點的局部卷積,提出一種基於卷積K-mer 的模式識別新方法「Kmer-based 深度學習模型」(KDL),以提高KPR 的有效性,並將採樣後的編碼融入神經網絡以提升其分類效能,以不同旋轉角度之資料圖像進行測試,最後與現有神經網絡模型LeNet、AlexNet 進行效能比較。本文採用MNIST 數字手寫圖像庫進行測試。發現在測試原始圖像時,圖像正確率達到92.24%,雖然正確率不及神經網絡模型,但在圖像旋轉的案例,表現將大幅優於目前現有的神經網絡模型,在旋轉±135°的資料集中圖像正確率達到71.8%,高較於LeNet 的45.16%及AlexNet 的47.54%。本論文所提出之方法較現有神經網絡模型不受到測試圖像旋轉的影響,具有良好的旋轉圖像形貌辨識能力,相信此具有適應性的圖像K-mer 編碼方式,能夠廣泛地應用於許多不同的影像分析及特徵辨識之中。


    In 1998, LeCuN proposed an improved version of ConvNet, which was famously known as
    LeNet-5, and it started the use of CNN in classifying characters in a pattern recognition.Recently, the method of K-mer-based Pattern Recognition (KPR) was developed in the field of image processing. K-mer was adopted from DNA sequencing in biomedical field. In KPR,multiple arrays of samplings with the length of K were taken from the center of an image pattern toward the perimeter of its maximum range. The frequency of the K-mer-based sampling strings was utilized to build a classification set for the purposes of training and pattern recognition. In this paper, a new method of K-mer-based Deep Learning (KDL) for Pattern Recognition of Rotated Images was developed to enhance the effectiveness of KPR. Convolution processes were applied to add blurring in the process of K-mer-based sampling, enhancing point-sampling to local-integrating at each sampling point. The sampled code was integrated into neural
    network to improve its classification performance of rotated images. The proposed method was also compared with some existing neural network models such as LeNet and AlexNet. In this paper, MNIST digital handwritten image library was proposed to use for evaluation. For testing original images, the accuracy of image recognition reached about 92.24%. The accuracy of image recognition is not as good as neural network models, however, in the case of rotated images, the performance will be much better than the neural network models. The accuracy of image recognition reached about 71.8%, which is higher than LeNet's 45.16% and AlexNet's 47.54%. Compared with the existing neural network model, the proposed method in this paper is not affected by images rotation and has a better ability to recognize the rotated images.
    Therefore, we believe that the adaptive image KDL method can be widely applied in different kinds of image analysis and feature recognition.

    第一章、序論 1.1 前言 1.2 動機 1.3 論文架構 2. 第二章、KDL 之基礎理論 2.1 圖像數據庫 2.2 多層感知器 (MULTILAYER PERCEPTRON) 2.2.1 激活函數(Activation Functions) 2.2.2 損失函數(Loss Function) 2.2.3 反向傳播(Back Propagation) 2.2.4 梯度下降法(Gradient Descent) 2.3 卷積神經網路 2.3.1 卷積層(Convolution layer) 2.3.2 池化層(Pooling layer) 2.3.3 Dropout 2.3.4 全連接層(Fully connected layer) 2.4 K-MER 頻率分類方法 2.5 評估指標 2.6 K-FOLD 交叉驗證 第三章、KDL 之設計規劃 3.1 旋轉資料集 3.2 K-MER 影像編碼優化 3.2.1 K-mer 提取灰階編碼 3.2.2 卷積模糊 3.3 卷積神經網絡模型 3.3.1 LeNet 3.3.2 AlexNet 3.4 影像融合 3.5 K-MER 網絡 3.6 混合網絡設計 3.6.1 LeNet + Kmer 3.6.2 AlexNet + Kmer 3.7 整體架構 第四章、實驗結果 4.1 灰度值大小排序對於K-MER 編碼之影響 4.2 KDL 與KPR 比較 4.3 正常K-MER 編碼 與 卷積K-MER 編碼比較 4.4 神經網絡模型於旋轉資料庫之效能比較 4.4.1 LeNet 4.4.2 AlexNet 4.5 大旋轉角度於各類別之影響 4.6 各模型比較總表 第五章、結論與未來展望 5.1 結論 5.2 未來展望

    [1] A. J. Newell and L. D. Griffin, "Multiscale histogram of oriented gradient descriptors for robust character recognition," in 2011 International conference on document analysis and recognition, 2011: IEEE, pp. 1085-1089.
    [2] S. Baluja, "Making templates rotationally invariant: An application to rotated digit recognition," in NIPS, 1998: Citeseer, pp. 847-853.
    [3] A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, "A survey of the recent architectures of deep convolutional neural networks," Artificial Intelligence Review, vol. 53, no. 8, pp. 5455-5516, 2020.
    [4] F. Sultana, A. Sufian, and P. Dutta, "Advancements in image classification using
    convolutional neural network," in 2018 Fourth International Conference on Research
    in Computational Intelligence and Communication Networks (ICRCICN), 2018: IEEE,
    pp. 122-129.
    [5] Y.-T. Yao, Y.-W. Wu, and P. T. Lin, "A two-stage multi-fidelity design optimization for K-mer-based pattern recognition (KPR) in image processing," in International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, 2020, vol. 84010: American Society of Mechanical Engineers, p.V11BT11A031.
    [6] Y.-T. Yao, Y.-H. Chen, Y.-W. Wu, K.-Y. Liu, C.-Y. Chang, and P. T. Lin, "K-mer-based Pattern Recognition (KPR) for Infrastructure Crack Classification," in 2019 Asian Pacific Congress on Computational Mechanics (APCOM 2019), Taipei, Taiwan, 2019.
    67
    [7] Y.-T. Yao, Y.-W. Wu, and P. T. Lin, "K-mer-based Pattern Recognition (KPR) for the Keyboard Inspection," presented at the 20th World Congress on Non-Destructive
    Testing (WCNDT 2020), Seoul, Korea, 2020, Paper No. A20191001-0276., 2020.
    [8] 陸韋豪, 林書平, 林柏廷, and 吳育瑋, "人工智慧影像辨識系統之開發及應用,"
    2018.
    [9] Python. "Home Page." https://www.python.org/ (accessed.
    [10] Pytorch. "Home Page." https://pytorch.org/ (accessed.
    [11] L. Deng, "The mnist database of handwritten digit images for machine learning research [best of the web]," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141-142, 2012.
    [12] NIST. "NIST Special Database 19." https://www.nist.gov/srd/nist-special-database-19 (accessed.
    [13] M.-C. Popescu, V. E. Balas, L. Perescu-Popescu, and N. Mastorakis, "Multilayer
    perceptron and neural networks," WSEAS Transactions on Circuits and Systems, vol. 8,
    no. 7, pp. 579-588, 2009.
    [14] S. Sharma and S. Sharma, "Activation functions in neural networks," Towards Data
    Science, vol. 6, no. 12, pp. 310-316, 2017.
    [15] C. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall, "Activation functions:
    Comparison of trends in practice and research for deep learning," arXiv preprint
    arXiv:1811.03378, 2018.
    [16] B. Xu, N. Wang, T. Chen, and M. Li, "Empirical evaluation of rectified activations in convolutional network," arXiv preprint arXiv:1505.00853, 2015.
    [17] D. Misra, "Mish: A self regularized non-monotonic neural activation function," arXiv 68 preprint arXiv:1908.08681, vol. 4, 2019.
    [18] A. F. Agarap, "Deep learning using rectified linear units (relu)," arXiv preprint arXiv:1803.08375, 2018.
    [19] Pytorch. "Softmax." https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html (accessed.
    [20] P. Golik, P. Doetsch, and H. Ney, "Cross-entropy vs. squared error training: a theoretical and experimental comparison," in Interspeech, 2013, vol. 13, pp. 1756-1760.
    [21] Pytorch. "CROSSENTROPYLOSS."
    https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html (accessed.
    [22] Pytorch. "LOGSOFTMAX."
    https://pytorch.org/docs/stable/generated/torch.nn.LogSoftmax.html#torch.nn.LogSoft
    max (accessed.
    [23] Pytorch. "NLLLOSS."
    https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss (accessed.
    [24] Y. LeCun, D. Touresky, G. Hinton, and T. Sejnowski, "A theoretical framework for
    back-propagation," in Proceedings of the 1988 connectionist models summer school, 1988, vol. 1, pp. 21-28.
    [25] E. Dogo, O. Afolabi, N. Nwulu, B. Twala, and C. Aigbavboa, "A comparative analysis of gradient descent-based optimization algorithms on convolutional neural networks," in 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), 2018: IEEE, pp. 92-99.69
    [26] R. Yamashita, M. Nishio, R. K. G. Do, and K. Togashi, "Convolutional neural networks:an overview and application in radiology," Insights into imaging, vol. 9, no. 4, pp. 611-629, 2018.
    [27] N. Qian, "On the momentum term in gradient descent learning algorithms," Neural
    networks, vol. 12, no. 1, pp. 145-151, 1999.
    [28] J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of machine learning research, vol. 12, no. 7, 2011.
    [29] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
    [30] H. H. Aghdam and E. J. Heravi, "Guide to convolutional neural networks," New York,NY: Springer, vol. 10, no. 978-973, p. 51, 2017.
    [31] S. B. Driss, M. Soua, R. Kachouri, and M. Akil, "A comparison study between MLP and convolutional neural network models for character recognition," in Real-Time Image and Video Processing 2017, 2017, vol. 10223: International Society for Optics and Photonics, p. 1022306.
    [32] P. Baldi and P. J. Sadowski, "Understanding dropout," Advances in neural information processing systems, vol. 26, pp. 2814-2822, 2013.
    [33] Y. LeCun et al., "Backpropagation applied to handwritten zip code recognition," Neural computation, vol. 1, no. 4, pp. 541-551, 1989.
    [34] K. Fukushima and S. Miyake, "Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition," in Competition and cooperation in neural nets: Springer, 1982, pp. 267-285.
    70
    [35] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [36] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
    [37] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in European conference on computer vision, 2014: Springer, pp. 818-833.
    [38] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale
    image recognition," arXiv preprint arXiv:1409.1556, 2014.
    [39] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE
    conference on computer vision and pattern recognition, 2015, pp. 1-9.
    [40] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, "Support vector machines," IEEE Intelligent Systems and their applications, vol. 13, no. 4, pp. 18-28,1998.
    [41] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
    [42] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
    [43] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected
    convolutional networks," in Proceedings of the IEEE conference on computer vision
    71 and pattern recognition, 2017, pp. 4700-4708.
    [44] 姚佑達, "基於二階段多保真最佳化之智慧影像辨識方法," 碩士, 機械工程系, 國立臺灣科技大學, 台北市, 2020. [Online]. Available: https://hdl.handle.net/11296/d5536x

    無法下載圖示 全文公開日期 2024/08/13 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE