簡易檢索 / 詳目顯示

研究生: 李詩恩
Shih-En Lee
論文名稱: 使用深度相機於動態時間校正分類器的即時音樂指揮手勢辨識之研究
Real-time Musical Conducting Gesture Recognition Based on a Dynamic Time Warping Classifier Using a Depth Camera
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 林啟芳
Chi-Fang Lin
李建德
Jiann-Der Lee
馮輝文
Huei-Wen Ferng
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 61
中文關鍵詞: 人機互動動態手勢辨識樂指揮家動態時間扭曲
外文關鍵詞: palm tracking, musical gesture, musical conductor
相關次數: 點閱:198下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 過去這幾年,人跟電腦一直都有越來越多機會互動與連結。在日常生活都有新的科技讓我們可以很方便的跟電腦互動,也可以透過手勢來控制他們。隨著科技迅速的演變與發展,人們使用各項技術來實現與機器溝通和互動的想法。
    本論文使用深度相機來即時的判斷音樂指揮家的動作。首先,我們使用Kinect的開發軟體,讓我們抓取使用者的骨架。找到骨架後我們可以辨識手掌的位子,進一步捕捉手勢。我們使用了Dynamic Time Warping來做分類,也透過我們的方法來判斷指揮的速度及手勢。
    我們對五位不同的人蒐集三種音樂指揮的動作。實驗結果顯示對於這六種手勢的平均辨識率達89.17%,總共實驗了5,600 次的指揮動作。我們實驗了三種不同的動作,五種不同的速度,還有七種不同的角度,而實驗用的相機能拍到30FPS。


    In the past few years, human-computer interactions have become more and more important in our everyday lives and as a result, we have been continually finding new ways to interact with the computers around us. Gesture interaction with a computer has gradually become more and more mainstream, but with it comes the technical difficulties and issues that are waiting to be resolved. Because of the rapid development of science and technology, humans want to find new ways to communicate and interact with the machines around them.
    In this paper, we used a single depth camera to capture image inputs and created a real-time dynamic gesture recognition system. We used the Kinect Software Developer Kit to create a skeleton model that captures the position of user's palm as it moves. We collect data as the palm is making the gesture so that the results can show in real time. After we collect the data, we can calculate which gesture is being made by using the Dynamic Time Warping algorithm, and we also use our own algorithm to calculate how fast the gesture is being made. The result is a real-time dynamic gesture recognition system for musical conductors.
    We collected three basic musical gestures at five different speeds with seven different angles and created a database of five different people. There were 5,600 gesture data. The experimentation show that the average accuracy of the five gestures are 89.17%. The system we proposed can achieve real-time recognition with a performance rate of 30 frames per second.

    中文摘要 Abstract 致謝 Contents List of Figures List of Tables Chapter 1 Introduction 1.1 Overview 1.2 Motivation 1.3 System Description 1.4 Thesis Organization Chapter 2 Background and Related Work 2.1 Glove Based Gesture Recognition 2.1.1 The Data Glove 2.1.2 Inertial Measurement Glove 2.2 Vision Based Gesture Recognition 2.2.1 Using Dynamic Time Warping 2.2.2 Conducting Feature Points 2.3 Depth Based Gesture Recognition 2.3.1 Hand Pose Segmentation for Depth Image 2.3.2 Capturing Gestures Using Depth Image Sensor Chapter 3 Musical Conducting Gesture Recognition 3.1 Hand Tracking 3.2 Separating Continuous Gestures into Single Gestures 3.3 Tempo Recognition 28 3.4 Dynamic Time Warping Classifier Chapter 4 Experimental Results and Discussions 4.1 Experimental Setup 4.2 Database Setup 4.3 Results of Single Musical Gesture Recognition 4.4 Results of Continuous Musical Gesture Recognition 4.5 Comparison of Existing Methods and Our Method Chapter 5 Conclusions and Future Work 5.1 Conclusions 5.2 Future Work References

    [1] H. Morita, S. Hashimoto, and S. Ohteru, "A computer music system that follows a human conductor," IEEE Computer Society, vol. 24, no. 7, pp. 44-53, July 1991.
    [2] S. Cosentino, Y. Sugita, and M. Zecca, "Music conductor gesture recognition by using inertial measurement system for human-robot musical interaction," in Proceedings of the IEEE International Conference on Robotics and Biomimetrics, Guangzhou, China, pp. 30-35, April 2012.
    [3] W. Ahmed, K. Chanda, and S. Mitra, "Vision based hand gesture recognition using dynamic time warping for Indian sign language," in Proceedings of the International Conference on Information Science, Kochi, India, pp. 120-125, Aug. 2016.
    [4] H. Je, J. Kim, and D. Kim, "Vision-based hand gesture recognition for understanding musical time pattern and tempo," in Proceedings of the 33rd Annual Conference of the IEEE Industrial Electronics Society, Taipei, Taiwan, pp. 2371-2376, Nov. 2007.
    [5] W. J. Tsai, J. C. Chen, and K. W. Lin, "Depth-based hand pose segmentation with Hough random forest," in Proceedings of the International Conference on Green Technology and Sustainable Development, Kaohsiung, Taiwan, pp. 166-167, Nov. 2016.
    [6] N. Kawarazaki, Y. Kaneishi, N. Saito, and T. Asakawa, "A supporting system of chorus singing for visually impaired persons using depth image sensor," in Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Kaohsiung, Taiwan, pp. 1-4, Nov. 2013.
    [7] G. Bartini and P. Carosi, "The light baton: a system for conducting computer music performance," Man-Machine Interaction in Live Performance, vol. 22, no. 3, pp. 243-257, April 1992.
    [8] Z. Ren, J. Meng, J. Yuan, and Z. Zhang, "Robust part-based hand gesture recognition using Kinect sensor," IEEE Transactions on Multimedia, vol. 15, no. 5, pp. 1110-1120, Feb. 2013.
    [9] L. W. Toh, W. Chao, and Y. S. Chen, "An interactive conducting system using Kinect," in Proceedings of the IEEE International Conference on Multimedia and Expo, San Jose, CA, USA, pp. 1-6, July 2013.
    [10] S. Chen, Y. Maeda, and Y. Takahashi, "Melody oriented interactive chaotic sound generation system using music conductor gesture," in Proceedings of the IEEE International Conference on Fuzzy Systems, Beijing, China, pp. 1287-1290, July 2014.
    [11] E. J. Keogh and M. J. Pazzani, "Derivative dynamic time warping," in Proceedings of the 1st International Conference of Data Mining, New York, New York, pp. 1-11, Nov. 2001.
    [12] J. F. Lichtenauer, E. A. Hendriks, and M. J. Reinders, "Sign language recognition by combining statistical DTW and independent classification," IEEE Transactions on Pattern Analysis Machine Intelligence, vol. 30, no. 11, pp. 2040-2046, May 2008.
    [13] G. Fazekas, M. Barthet, and M. B. Sandler, "Mood conductor: emotion-driven interactive music performance," in Proceedings of the Humaine Association Conference on Affective Computing and Intelligent Interaction, Geneva, Switzerland, pp. 726-726, Sept. 2013.
    [14] S. Chen, Y. Maeda, and Y. Takahashi, "Music conductor gesture recognized interactive music generation system," in Proceedings of the 6th International Conference on Soft Computing and Intelligent Systems, and the 13th International Symposium on Advanced Intelligence Systems, Kobe, Japan, pp. 840-845, Nov. 2012.
    [15] Z. Bien and J. Kim, "On-line analysis of music conductor's two-dimensional motion," in Proceedings of the IEEE International Conference on Fuzzy Systems, San Diego, California, pp.1047-1053, March 1992.
    [16] R. Augustauskas and A. Lipnickas, "Robust Hand Detection using arm segmentation from depth data and static palm gesture recognition," in Proceedings of the IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, Bucharest, Romania, pp. 664-667, 2017.
    [17] A. Nijholt, D. Reidsma, R. Ebbers, and M. Maat, "The virtual conductor-learning and teaching about music, performing, and conducting" in Proceedings of the IEEE International Conference on Advanced Learning Technologies, Santander, Cantabria, Spain, pp. 897-899, July 2008.
    [18] M. Grachten, C. E. Cancino-Chacon, T. Gadermaier, and G. Widmer, "Toward computer assisted understanding of dynamics in symphonic music," IEEE MultiMedia, vol. 24, no. 1, pp. 36-46, Feb. 2017.
    [19] D. Bradshaw and K. Ng, "Tracking conductors hand movements using multiple wiimotes," in Proceedings of the International Conference on Automated Solutions for Cross Media Content and Multi-Channel Distribution, Florence, Italy, pp. 93-99, Nov. 2008.
    [20] J. Segen, J. Gluckman, and S. Kumar, "Visual interface for conducting virtual orchestra," in Proceedings of the 15th International Conference on Pattern Recognition, Barcelona, Spain, pp. 276-279, Sept. 2000.
    [21] J. C. Lin and C. M. Huang, "3D hand posture tracking with depth gradient estimation on an RGB-D camera," in Proceedings of the 17th International Symposium on Consumer Electronics, Hsinchu, Taiwan, pp. 109-110, June 2013.
    [22] Q. Y. Zhang, M. Y. Zhang, and J. Q. Hu, "A method of hand gesture segmentation and tracking with appearance based on probability model," in Proceedings of the 2nd International Symposium on Intelligent Information Technology Application, Shanghai, China, vol. 1, pp. 380-383, Dec. 2009.
    [23] Z. Ju, X. Ji, J. Li, and H. Liu, "An integrative framework of human hand gesture segmentation for human-robot interaction," IEEE Systems Journal, vol. 11, no. 3, pp. 1326-1336, Sept. 2017.
    [24] K. Barczewska and A. Drozd, "Comparison of methods for hand gesture recognition based on dynamic time warping algorithm," in Proceedings of the Federated Conference on Computer Science and Information Systems, Krakow, Poland, pp. 207-210, Sept. 2013.
    [25] D. H. Pal and S. M. Kakade, "Dynamic hand gesture recognition using Kinect sensor," in Proceedings of the International Conference of Global Trends in Signal Processing, Information Computing and Communication, Jalgaon, India, pp. 448-453, Dec. 2016.
    [26] S. Sridhar, F. Mueller, A. Oulasvirta, and C. Theabalt, "Fast and robust hand tracking using detection-guided optimization," in Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts, pp. 3213-3221, June 2015.
    [27] D. L. M. Lizaraza and J. A. T. Borja, "Hand position tracking using a depth image from an RGB-D camera," in Proceedings of the International Conference on Industrial Technology, Seville, Spain, pp. 1680-1687, March 2015.
    [28] M. Zhao and Q. Jia, "Hand segmentation using randomized decision forest based on depth images," in Proceedings of the International Conference on Virtual Reality and Visualization, Hangzhou, China, pp. 110-113, Sept. 2016.
    [29] M. Li, L. Sun, and Q. Huo, "Precise hand segmentation from a single depth image," in Proceedings of the 23rd International Conference on Pattern Recognition, Cancun, Mexico, pp. 2398-2403, Dec. 2016.
    [30] F. Zanman, S. T. Mossarrat, F. Islam, and D. Karmaker, "Real-time hand detection and tracking with depth values," in Proceedings of the International Conference on Advances in Electrical Engineering, Dhaka, Bangladesh, pp. 129-132, Dec. 2015.
    [31] H. V. Chavarria, H. J. Escalante, and L. E. Sucar, "Simultaneous segmentation and recognition of hand gestures for human-robot interaction," in Proceedings of the 16th International Conference on Advance Robotics, Montevideo, Uruguay, pp. 1-6, Nov. 2013.
    [32] G. Plouffe and A. Cretu, "Static and dynamic hand gesture recognition in depth data using dynamic time warping," IEEE Transactions on Instrumentation and Measurement, vol. 65, no.2, pp. 305-316, Feb. 2016.
    [33] Y. Liao, G. Li, Y. Sun, G. Jiang, J. Kong, D. Jiang, and Z. Ju, "Static hand gesture segmentation: comparison and selection of existing methods," in Proceedings of the 32nd Youth Academic Annual Conference of Chinese Association of Automation, Hefei, China, pp.889-894, May 2017.
    [34] T. Watanabe and M. Yachida, "Real time gesture recognition using eigenspace from multi-input image sequences," in Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, pp. 428-433, April 1998.

    無法下載圖示 全文公開日期 2023/07/09 (校內網路)
    全文公開日期 2028/07/09 (校外網路)
    全文公開日期 2028/07/09 (國家圖書館:臺灣博碩士論文系統)
    QR CODE