簡易檢索 / 詳目顯示

研究生: Lukman
Achmad Lukman
論文名稱: Deep Learning Approach For Object Recognition and Knowledge Transfer
Deep Learning Approach For Object Recognition and Knowledge Transfer
指導教授: 楊傳凱
Chuan-Kai Yang
口試委員: Yuan-Cheng Lai
Yuan-Cheng Lai
Bor-Shen Lin
Bor-Shen Lin
孫敏德
Min-Te (Peter) Sun
Yen-Hung Chen
Yen-Hung Chen
學位類別: 博士
Doctor
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 65
外文關鍵詞: Angular view, Full deep distillation mutual learning, Half deep distillation mutual learning
相關次數: 點閱:128下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  • Computer vision and machine learning are two methods that can combine to solve object recognition tasks. When the computer vision method extracts the features from image datasets,Machine learning uses the featuresto learn a model and then deploys the model on a testing or unseen example.A Convolutional neural network is one of the popular methods to address image classification task problems. This method is a part of developing Machine learning that combines computer vision and machine learning that, makes us easy to use without thinking about how to extract the features with another method.
    In this thesis, we proposed methods that can help to increase the performance of convolutional neural networks in two scenarios: object recognition and knowledge transfer learning. The first part of this thesis addresses how to collect specific 3D data for object recognition. We offer an angular view algorithm, and the integrated system uses incremental learning differently. In the second part, we study how to enhance the performance of deep mutual learning and combine it with the distillation knowledge method in the knowledge transfer problem. The goal is to produce two new approaches. We first made an angular view algorithm and rotating display tools to create a new dataset and use it to recognize images from inside and outside the lab. Secondly, We develop the integrated system with GoogleNetwithan angular view algorithm combined with two existing pre-trained models: AlexNet, and VGG16 weights. This system can recognize the new task without forgetting the old one inside the weight-pre-trained model using incremental training. Another way to increase the performance of Deep learning using the knowledge transfer paradigm, we formulatedthe new approach methods Full Deep Distillation Mutual Learning (FDDML) and Half Deep Distillation Mutual Learning (HDDML). Weconducted extensive experiments on three open-source datasets, including CIFAR-100, TinyImageNet, and Cinic-10.

    ABSTRACT i ACKNOWLEDGMENTS iii CONTENTS iv LIST OF FIGURES vii LIST OF TABLES ix 1. INTRODUCTION 1 1.1 Motivation 1 1.2 Outline and contributions 2 1.2.1 Part I Object recognition using angular view technique 2 1.2.2 Part II Transfer Knowledge with FDDML and HDDML 3 2. OBJECT RECOGNITION USING ANGULAR VIEW TECHNIQUE 4 2.1 Overview 4 2.2 Literature review 6 2.3 Research method 8 2.3.1 Angular view algorithm 9 2.3.2 Segmentation 11 2.3.3 The performance of Angular view effect on the System 12 2.3.4 Integrated system 13 2.3.5 Comparison with the others incremental learning methods 15 2.4 Experiments 16 2.4.1 Datasets and evaluation 16 2.4.2 Experimental procedure 18 2.4.3 Evaluation 19 2.4.4 Comparison with others 24 2.4.5 Evaluation of the integrated system 25 2.5 Conclusion 28 3. KNOWLEDGETRANSFER WITH FDDML AND HDDML 29 3.1 Overview 29 3.2 Literature review 32 3.3 The proposed FDDML and HDDML model 33 3.3.1 DML and KD 34 3.3.2 Full Deep Distillation Mutual Learning(FDDML) 36 3.3.3 Half Deep Distillation Mutual Learning(HDDML) 38 3.4 Experiments 39 3.4.1 Dataset 39 3.4.2 Neural Net Models 39 3.4.2.1. On CIFAR-100 Training and Testing 39 3.4.2.2. On TinyImageNet 64×64 Image Size Training and Testing 40 3.4.2.3. On TinyImageNet 32×32 Downsampled Image Size Training and Testing 40 3.4.2.4. On Cinic-10 32 × 32 with image Size Training and Testing 40 3.4.3 Implementation Details 40 3.4.4 Evaluation on CIFAR-100 41 3.4.5 Evaluation on TinyImageNet 43 3.4.6 Evaluation on Cinic-10 46 3.5 Conclusion 48 4. LIMITATION AND FUTURE DIRECTIONS 50 REFERENCES 51

    [1] Boykov YY, Jolly M-P (2001) Interactive graph cuts for optimal boundary & region segmentation of objects in and images. In: Proceedings eighth IEEE international conference on computer vision. ICCV 2001, vol 1. IEEE, pp 105–112
    [2] Chen J, Ying H, Liu X, Gu J, Feng R, Chen T, Gao H, Wu J (2020) A transfer learning based superresolution microscopy for biopsy slice images: The joint methods perspective. IEEE/ACM Transactions on Computational Biology and Bioinformatics
    [3] Durmus¸ H, G¨ unes¸ EO, Kırcı M (2017) Disease detection on the leaves of the tomato plants by using deep learning. In: 2017 6th International Conference on agro-geoinformatics. IEEE, pp 1–5
    [4] Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611.
    [5] Gao Z, Wang D, Xue Y, Xu G, Zhang H, Wang Y (2018) 3D object recognition based on pairwise multi-view convolutional neural networks. J Vis Commun Image Represent 56:305–315.
    [6] Gutstein S, Stump E (2015) Reduction of catastrophic forgetting with transfer learning and ternary output codes. In: 2015 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 1–8.
    [7] He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778.
    [8] Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 0.5 mb model size. arXiv:1602.07360
    [9] Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678.
    [10] Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A, et al. (2017) Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences 114(13):3521–3526.
    [11] Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105.
    [12] LeCun Y, Jackel L, Bottou L, Cortes C, Denker JS, Drucker H, Guyon I, Muller UA, Sackinger E, Simard P, et al. (1995) Learning algorithms for classification: A comparison on handwritten digit recognition. Neural Networks: The Statistical Mechanics Perspective 261:276.
    [13] Li W, Bebis G, Bourbakis NG (2008) 3-d object recognition using 2-d views. IEEE Trans Image Process 17(11):2236–2255.
    [14] Li Y, Hu H, Zhou G (2018) Using data augmentation in continuous authentication on smartphones. IEEE Internet of Things Journal 6(1):628–640.
    [15] Ma J, Wang X, Jiang J (2019) Image super-resolution via dense discriminative network. IEEE Transactions on Industrial Electronics.
    [16] Ma J, Zhang H, Yi P, Wang Z-Y (2019) Scscn: A separated channel-spatial convolution net with attention for single-view reconstruction. IEEE Transactions on Industrial Electronics.
    [17] Mikołajczyk A, Grochowski M (2018) Data augmentation for improving deep learning in image classification problem. In: 2018 international interdisciplinary Ph.D. workshop (IIPhDW). IEEE, pp 117–122.
    [18] Moujahid A (2018) A practical introduction to deep learning with caffe and python. Retrieved February 19:2016.
    [19] MS Windows NT kernel description. https://www.kaggle.com/alxmamaev/flowers-recognition/data. Accessed: 2018-11-25.
    [20] MS Windows NT kernel description. https://pjreddie.com/darknet/imagenet/#extraction/darknet19/. Accessed: 2019-04-11.
    [21] Muresan H, Oltean M (2018) Fruit recognition from images using deep learning. Acta Universitatis Sapientiae Informatica 10(1):26–42.
    [22] Rother C, Kolmogorov V, Blake A (2004) Grabcut: Interactive foreground extraction using iterated graph cuts. In: ACM transactions on graphics (TOG), vol 23. ACM, pp 309–314.
    [23] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252.
    [24] Sarwar SS, Ankit A, Roy K (2017) Incremental learning in deep convolutional neural networks using partial network sharing. arXiv:1712.02719.
    [25] Serra J, Suris D, Miron M, Karatzoglou A (2018) Overcoming catastrophic forgetting with hard attention to the task. arXiv:1801.01423.
    [26] Shmelkov K, Schmid C, Alahari K (2017) Incremental learning of object detectors without catastrophic forgetting. In: Proceedings of the IEEE international conference on computer vision, pp 3400–3409.
    [27] Shu X, Qi G.-J., Tang J, Wang J (2015) Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 35–44.
    [28] Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.
    [29] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958.
    [30] Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945–953.
    [31] Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9.
    [32] Tang J, Shu X, Li Z, Qi G.-J., Wang J (2016) Generalized deep transfer networks for knowledge propagation in heterogeneous domains. ACM Transactions on Multimedia Computing Communications and Applications (TOMM) 12(4s):68.
    [33] Wang Z, Hu M, Zhai G (2018) Application of deep learning architectures for accurate and rapid detection of internal mechanical damage of blueberry using hyperspectral transmittance data. Sensors 18(4):1126.
    [34] Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920.
    [35] Yang Z, Yu W, Liang P, Guo H, Xia L, Zhang F, Ma Y, Ma J (2019) Deep transfer learning for military object recognition under small training set condition. Neural Comput and Applic 31(10):6469–6478
    [36] Zhang C, Zhou P, Li C, Liu L (2015) A convolutional neural network for leaves recognition using data augmentation. In: 2015 IEEE International conference on computer and information technology; Ubiquitous computing and communications; Dependable, autonomic and secure computing; Pervasive intelligence and computing. IEEE, pp 2143–2150
    [37] Zhou H, Huang H, Yang X, Zhang L, Qi L (2017) Faster r-cnn for marine organism detection and recognition using data augmentation. In: Proceedings of the international conference on video and image processing. ACM, pp 56–62
    [38] Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June2014; pp. 580–587.
    [39] LeCun, Y.; Jackel, L.D.; Bottou, L.; Cortes, C.; Denker, J.S.; Drucker, H.; Guyon, I.; Muller, U.A.; Sackinger, E.; Simard, P.; Vapnik, V. Learning algorithms for classification: A comparison on handwritten digit recognition. Neural Netw. Stat. Mech. Perspect. 1995, 261, 2.
    [40] Wu, M.; Chen, L. Image recognition based on deep learning. In Proceedings of the 2015 Chinese Automation Congress (CAC), Wuhan, China, 27–29 November 2015; pp. 542–546.
    [41] Lilhore, U.K.; Imoize, A.L.; Lee, C.C.; Simaiya, S.; Pani, S.K.; Goyal, N.; Kumar, A.; Li, C.T. Enhanced Convolutional Neural Network Model for Cassava Leaf Disease Identification and Classification. Mathematics 2022, 10, 580.
    [42] Singh, T.P.; Gupta, S.; Garg, M.; Gupta, D.; Alharbi, A.; Alyami, H.; Anand, D.; Ortega-Mansilla, A.; Goyal, N. Visualization of Customized Convolutional Neural Network for Natural Language Recognition. Sensors2022, 22, 2881.
    [43] He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
    [44] Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv2015, arXiv:150302531.
    [45] Ba, L.J.; Caruana, R. Do deep nets really need to be deep? arXiv2013, arXiv:13126184.
    [46] Tian, Y.; Krishnan, D.; Isola, P. Contrastive representation distillation. arXiv2019, arXiv:191010699.
    [47] Zhang, Y.; Xiang, T.; Hospedales, T.M.; Lu, H. Deep mutual learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4320–4328
    [48] Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv2016, arXiv:160507146.
    [49] Tung, F.; Mori, G. Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Long Beach, CA, USA, 15–20 June2019; pp. 1365–1374.
    [50] Peng, B.; Jin, X.; Liu, J.; Li, D.; Wu, Y.; Liu, Y.; Zhou, S.; Zhang, Z. Correlation congruence for knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 5007–5016.
    [51] Yao, A.; Sun, D. Knowledge transfer via dense cross-layer mutual-distillation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August2020; pp. 294–311.
    [52] Park, W.; Kim, D.; Lu, Y.; Cho, M. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach, CA, USA, 15–20 June2019; pp. 3967–3976.
    [53] Zhao, H.; Yang, G.; Wang, D.; Lu, H. Deep mutual learning for visual object tracking. Pattern Recognit. 2021, 112, 107796.
    [54] Zagoruyko, S.; Komodakis, N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv2016, arXiv:161203928.
    [55] Krizhevsky, A.; Hinton, G.Learning Multiple Layers of Features from Tiny Images. 2009. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.222.9220&rep=rep1&type=pdf (accessed on 23 July 2022)
    [56] Le, Y.; Yang, X. Tiny imagenet visual recognition challenge. CS 231N2015, 7, 3.
    [57] Darlow, L.N.; Crowley, E.J.; Antoniou, A.; Storkey, A.J.; Cinic-10 is not imagenet or cifar-10. arXiv2018, arXiv:181003505.
    [58] Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEEConference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June2009; pp. 248–55.
    [59] Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520.
    [60] Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv2014, arXiv:14091556.
    [61] Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.;Le, Q.V. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach, CA, USA, 15–20 June2019; pp. 2820–288.
    [62] Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in PyTorch. In Proceedings of the NIPS Workshop 2017, Long Beach, CA, USA, 9 December 2017. Available online: https://openreview.net/forum?id=BJJsrmfCZ (accessed on 23 July 2022).
    [63] Buciluǎ, C., Caruana, R., & Niculescu-Mizil, A. (2006, August). Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 535-541)
    [64] Fang, Y., Wang, J., Ou, X., Ying, H., Hu, C., Zhang, Z., & Hu, W. (2021). The impact of training sample size on deep learning-based organ auto-segmentation for head-and-neck patients. Physics in Medicine & Biology, 66(18), 185012.

    QR CODE