簡易檢索 / 詳目顯示

研究生: 黃永松
Yung-Sung Huang
論文名稱: 適用非對稱多核心處理器之多媒體資料處理之研究
The study of multimedia data processing for asymmetric multi-core processor platform
指導教授: 邱炳樟
Bin-Chang Chieu
口試委員: 徐敬文
Ching-Wen Hsue
黃忠偉
Jong-Woei Whang 
許新添
Hsin-Teng Hsu
趙和昌
none
學位類別: 博士
Doctor
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 93
中文關鍵詞: 多媒體資料處理
外文關鍵詞: Multimedia data processing
相關次數: 點閱:514下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 當今多媒體與行動裝置的組合已變成主流消費性產品,在這類裝置執行視訊、串流等多媒體訊號處理越來越頻繁,如何讓多媒體訊號處理越順暢,已變成熱門的議題。許多硬體業者會朝增多CPU核心數加速效能,但也有使用GPU (Graphic Processing Unit)或是專門處理數學運算的DSP (Digital Signal Processor)來達到加速影像處理;軟體方面就以達成時間上的滿足為目標,朝管理儲存資料的記憶體、管控程式的程序、與演算程序簡化。綜合上述軟、硬體的獨立的處理方式不可能達到最佳化的效能提升。因此,本論文的研究重點是運用行動裝置的硬體平台的特性,搭配軟體管理去對多媒體處理器的架構做適度的效能調配。
    常見硬體效能提升有兩個做法,一是增加CPU的核心數,但是這類處理器是以序列處理,讓執行於作業系統層的播放多媒體的程式增加開發困難,因此不是我們要選擇的硬體平台;另一個是以RISC (Reduced Instruction Set Computing)為基礎的CPU,搭配DSP整合的SOC(System On Chip)為硬體平台,才是我們的選擇。由於這兩種處理器工作性質不相同,我們透過排程工作的演算過程來解決平行處理的工作派任,與不同以往都是先把排程確定後,系統就以固定模式去決定哪些工作由RISC去執行,哪些再由DSP去執行,而這樣的做法就是會讓某一個晶片一直在執行運算,無法達到分工成效,最常看到就是無法執行多個多媒體運算後,再去執行簡單輸入或輸出的行為,因為整個系統資源都被高運算的處理器占住,於是,我們改變以即時監控資源來判斷任務派工該如何進行,才能讓資源發揮物盡其用的效能表現;另外,不同性質的核心有不同架構,彼此資料傳遞單位也不同,如何減少因這些因素導致通訊困難使效能減緩,此刻,加入記憶體管理元件- DMA (Direct Memory Access),來改善時間消耗在兩個不同處理器間的通訊問題;最後,解決一個運算量最大的多媒體處理技術,提出動態品質控管演算法,主要是提升管理編碼的效能,讓串流播放順暢,透過對系統效能的監視,知道何時可以讓DSP多執行點工作,何時可以減少編碼在DSP運作,才不至於系統資源都被某個運算給拖住。
    手持裝置的硬體規劃強調是可長期攜帶去使用,但不諱言的是,多媒體的運用真的越來越廣泛,而它們執行的都是會消耗晶片運算,消耗的能量越多,電量也越成正比去消耗,倘若,電池能量無法變大下,就無法高續航力補給;且在硬體框架無法改變下,因此,我們提出透過軟體的運算來改善效能,進而讓使用者可以使用輕薄的手持裝置執行炫麗多媒體與長效時間執行。


    Nowadays the combination of multi-media and handheld devices has become the mainstream consumptive product. Processing of multi-media signals such as video and stream on such devices is more and more frequently. How to realize smooth processing of multi-media signal has become a hot topic. A lot of hardware manufacturers will improve performance by adding more CPUs, while some others make use of use Graphic Processing Unit (GPU) or Digital Signal Processor (DSP) to accelerate graphic processing. In terms of the software, it will manage memory, control the process of programs and simplify the algorithms to achieve the objective of meeting the time requirements. To sum up, the independent processing of software and hardware can’t optimize the performance improvement. Therefore, the dissertation will focus on applying the characteristics of the hardware platform on the handheld devices and integrating with the software management to conduct appropriate performance allocation for the architecture of the multi-media processor.
    There are two common methods to improve hardware performance. One is adding more CPU cores, in which the cores process is in a sequential way that makes it more difficult to write the multi-media players on application layer. Thus, we don’t choose such hardware platform. The other is a hardware platform composed of a CPU based on Reduced Instruction Set Computing (RISC) and a System On Chip (SOC) integrated with DSP, which is our choice due to the parallel processing of DSP. With scheduling algorithm, we resolve the assignment of parallel processing tasks for two different processors, which is different from the previous solution that the first makes the schedule and then distributes the tasks for RISC and DSP respectively in a fixed way. In this way, a certain chip will keep computing, so it fails to achieve the effectiveness of task partitioning. The common scenario is they can’t perform multiple multi-media computing tasks before processing the simple input/output, because all system resources are occupied by high-computing processor. In view of this, we change to adjust the time for task assignment by real-time monitoring resources, so as to maximize the performance of the resources. Besides, the core architecture varies in different characteristics, so does the data transmission unit. Regarding reducing the performance degradation due to these factors, Direct Memory Access (DMA) is added to improve the communication problem of time consumed between two different processors. Finally, in order to resolve image compression- the multi-media processing technology with the heaviest computing, the dynamic quality control algorithm is proposed. It mainly improves the encoding performance, achieving smooth stream playing. By monitoring the system performance, it knows when to assign more tasks to DSP, and when to reduce the encoding workload in DSP. Consequently, it avoids the system resources not being occupied by a certain computing.
    The highlight of hardware scheme for the handheld device lies in the portability. However, to confess, as the multi-media will be applied more widely and the execution will consume chip computing, the battery power will become less in a proportional way with more energy consumed. If it fails to gain more battery power, it will be unable to achieve high-endurance power supply. Thus, we propose improving performance by software computing, which further allows the user to execute various multi-media in a long time on light and slim handheld devices.

    Abstract...................................................................1 中文摘要....................................................................3 Acknowledge................................................................5 Table of Contents...................................................................6 List of Figures...................................................................10 List of Tables....................................................................11 Chapter 1 Introduction..............................................................12 1.1 Motivation…………………………………..……………...........................……………...12 1.2 The proposed optimization approached and their contribuctions.....................................................………....13 1.2.1 Architecture for video coding on processor with an ARM and DSP...................................................................…...13 1.2.2 A video decoding optimization for asymmetric dual-core platform architecture………….................................……………………………………………………………….16 1.2.3 Architecture for video streaming application on heterogeneous.....................................................……......18 1.3 Organization..........................……………………..………………………….………...21 Chapter 2 Architecture for video coding on a processor with an ARM and DSP ..........................................................................23 2.1 System overview.......…...........………........……………………………………………...23 2.2 Asymmetric multi-core processing architecture…….....….…………..……...24 2.3 Internal-processor communication mechanism…………….…........……….....25 2.3.1 Implement processing………………………………….....................………………...25 2.3.1.1 Implement processing for ARM side……..............……………………….….25 2.3.1.2 Implement processing for DSP side…..........……………………...……....26 2.4 Intra prediction processing……….............………..….…………….………….....26 2.5 Interpolation processing………………..............………………………...…..….....31 2.6 Block matching algorithm for motion estimation…....………….….…….....32 2.6.1 Search block by three step search………………..............…………………...33 2.6.2 Comparsion related search block algorithm…….........……………………...34 2.7 Temporal redundancy processing…..……….…………….….............……….....36 2.8 Inter prediction processing………………………………................…………….....41 2.9 How to platform module and task with CPU……..........…………………….....43 Chapter 3 A video decoding optimization for asymmetric dual-core platforms architecture……………………………………………………………..............................………….....46 3.1 A discussion about dual-core processor design….......……….………………..47 3.1.1 Access unit consistency………………..……...…...............…………..……...47 3.1.2 Sequence switching issue…………….…………………..................……….....47 3.1.3 Consistency of data sharing issue………........………………...….……......48 3.2 Asymmetric multi-core decoding architecture…...........……………………..49 3.2.1 Task processing unit………………………………...................………………….....50 3.2.2 Task distributor and task interface….......……………………………...….....51 3.3 H.264 bitstream transcoder………...………………….............…….……….….....51 3.3.1 Spatial downscaling……..………………….................…………….…………......52 3.3.2 Mixed block……………………………………………........................……………......52 3.4 H.264 key frame generator……………….................……………………………......54 3.4.1 Scene change………………………..……………......................…………………......54 3.4.2 Playback control elements………..............……………..……………………......55 3.5 Optimization for DMA architecture……………...........………………...…….....56 3.5.1 Dual-core processing architecture…….........…………………………...….....56 3.5.2 Internal-processor communication mechanism……….....…………...….....58 3.5.2.1 Implement processing for ARM side………………...........……………….....58 3.5.2.2 Implement processing for DSP side…………..........………….………….....58 3.5.3 DMA planning…………………………………….........................………………….....58 3.5.4 Dynamically partitioned H.264 decoder…….........……...…………….....59 3.5.4.1 I frame prediction decoding……….............…………..………………….....60 3.5.4.2 Interpolation processing……………….............………..……….………….....62 3.5.4.3 Temporal redundancy…………………..…..................………….………….....63 3.5.4.4 P frame prediction decoding………….............…………..……………….....65 Chapter 4 Architecture for video streaming application on asymmetric platform…………………………………………………………....................................……………………67 4.1 System overview………………………........................……………………………….....67 4.2 Dynamically adjust encoder parameter….........…………...……………….....67 4.2.1 Architecture description……….................………...…………………….....68 4.2.2 The problems with formulation…………............……...…………………….....70 4.2.3 Proposed algorithm…………………...................………..……………………….....71 Chapter 5 Experimental results and assist……............………….………………………....73 5.1 Introductions…..………………......................………………………………….…..…...73 5.2 Experiment of static task scheduling and dynamic task shceduling comparisons………………………………...……...........................…………..………………..……...74 5.3 Experimental results processed in the I frame…..……...…….………….....75 5.4 Experimental results processed in the P frame…………...………...…….....76 5.5 Experiment on streaming library performance………........…………………....78 5.6 Performance improvement summary of DMA architecture….....………..…..81 5.7 Performance comparisons of benchmarking...…….........……………..………..82 Chapter 6 Conclusions…………………..……..................………….…...……………..…......84 6.1 Conclusions…..……………………….......................…………..…………………..…...84 6.1.1 Architecture for video streaming……...…........……..…………………..…...85 6.1.2 Architecture for video decoding optimization……..….………...…..…...85 Reference…………………..…………………….........................………….…...……………..…......88

    [1] Texas Instruments, Inc.,”TMS320C55X DSP Programmer’s Guide”.

    [2]Texas Instruments, Inc.,”TMS320C55X Image/Video Processing Library Programmer’s Reference”.
    [3]Jamil Chaoui, Ken Cyr, Sebastien de Gregorio, Jean-Pierre Giacalone, Jennifer Webb, Yves Masse, Open multimedia application platform: enable multimedia applications in third generation wireless terminals through a combined RISC/DSP architecture, Proceeding of ICASSP2001, Page:1009 – 1012 vol.2, May 2001.
    [4]Atsushi Hatabu, Takashi Miyazakl, and Ichiro Kuroda, QVGA/CIF resolution MPEG-4 video codec based on a low-power and general-purpose DSP, IEEE 2002
    [5]Kyu Ha Lee, Keun-Sup Lee, Tae-Hoon Huang, Young-Cheol Park and Dae Hee Youn, An architecture and implementation of MPEG audio layerIII decoder using dual-core DSP, IEEE Transaction on Consuner Electronics, Vol 47, No4, November 2001.
    [6]Olli Lehtoranta, Timo Hamalainen and Jukka Saarinen, Real-time H.263 encoding of QCIF-images on TMS320C6201 fixed point DSP, ISCAS 2000- IEEE International Symposium on Circuits and Systems, May 6 28-31, 2000, Geneva, Swizerland.
    [7]Atsushi Hatabu, Takashi Miyazakl, and Ichiro Kuroda, QVGA/CIF resolution MPEG-4 video codec based on a low-power and general-purpose DSP, IEEE 2002
    [8]Byeong-Doo Choi, Kang-Sun Choi, Sung-Jea Ko, Senior Member, IEEE, and Aldo W. Morales, Senior Member, IEEE, Efficient real-time implementation of MPEG-4 audiovisual decoder using DSP and RISC chips. IEEE 2003.
    [9]Hana-Joachim Stolberg, Mladen Berekovic, Lars Friebe, S’OREN Moch, Sebastian Fl’ugel, Xun Mao, Mark B. Kulaczewski, Heiko Klubmann, and Peter Pirsch, HiBRID-SoC: A Multi-core System on Chip Architecture for Multimedia Signal Processing Applications, Proceedings of Design, Automation and Test in Europe Conference and Exhibition(DATE’03), IEEE 2003.
    [10]Tsung-Fan Shen, “Design and Analysis of a Dynamic Task Partitioning Approach for Video Decoding on Heterogeneous Dual-core Platforms,” master thesis, NCTU, June 2008.
    [11]Sheila Rader, etc., “Mobile Extreme Convergence: A Streamlined Architecture to Deliver Mass-market Converged Mobile Devices,” Freescale white paper.
    [12]T. L. Casavant and J. G. Kuhl, ‘‘A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems,’’ IEEE Transactions on Software Engineering, vol. 14, no. 2, pp. 141-154, February 1988.
    [13]Kyu Ha Lee, Keun-Sup Lee, Tae-Hoon Hwang, Young-Cheol Park and Dae Hee Youn, “An Architecture and Implementation of MPEG Audio Layer III Decoder using Dual-Core DSP,” IEEE Transactions on Consumer Electronics , November 2001.
    [14]Byeong-Doo Choi, Kang-Sun Choi, Sung-Jea Ko, Aldo W. Morales, “Efficient Real-Time Implementation of MPEG-4 Audiovisual Decoder Using DSP and RISC Chips,” IEEE International Conference on Consumer Electronics (ICCE), 2003.
    [15]Sheila Rader, etc., “Mobile Etreme Convergence: A Streamlined Architecture to Deliver Mass-market Converged Mobile Device,” Freescale white paper.
    [16]HEIRICH A., ARVO J.: A Competitive Analysisof Load Balancing Strategies for Parallel Ray Tracing. J.Supercomput. 12, 1-2 (1998), 57–68.
    [17]Hakem Beitollahi and Geert Deconinck, “Fault-Tolerant Partitioning Scheduling Algorithms in Real-Time Multiprocessor Systems," Proceedings of the 12th Pacific Rim International Symposium on Dependable Computing, Pages 296-304, 2006.
    [18]Fengguang Song, Asim Yarkhan and Jack Dongarra, “Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems,", Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Article No. 19, 2009.
    [19]Daniel Cedermany and Philippas Tsigas, “On dyna,ic load balancing on graphics processors,", Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, pp. 57-64, 2008.
    [20]Koziri, M. ; Zacharis, D. ; Katsavounidis, I. ; Bellas, N., “Implementation of the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor ” IEEE Transactions on Consumer Electronics, 2011. Page(s): 673 – 681
    [21]Lee, J.Y; Lee, J.-J; Park, S.M.,“Multi-core platform for an efficient H.264 and VC-1video decoding based on macroblock row-level parallelism” Circuits, Devices & Systems, IET , 2010. Volume: 4 , Issue: 2 .Page(s): 147 - 158
    [22]J. Rey, Y. Matsui, “RTP Payload Format for 3rd Generation Partnership Project (3GPP) Timed Text,” IETF RFC 4396, February 2006.
    [23]3GPP 3rd Generation Partnership Project, “3G Security; Security of Multimedia Broadcast/Multicast Service (Release 7).” 3G TS 33.246, 3GPP, January 2008.
    [24]H. Schulzrinne et al., “RTP: A Transport Protocol for Real-Time Applications,” IETF RFC 3550, July 2003.
    [25]M. Handley and V. Jacobson, “SDP: Session Description Protocol,” IETF RFC 2327, April 1998.
    [26]Atsushi Hatabu, Takashi Miyazaki, and Ichiro Kuroda,” QVGA/CIF resolution MPEG-4 video codec based on low power and general-purpose DSP”, IEEE 2002. Page(s): 15 – 20
    [27]Byeong-Doo Chi, Kang-Sun Choi, Sung-Jea Ko,” Efficient real-time implementation of MPEG-4 audiovisual decoder using DSP and RISC chip”. IEEE International Conference on Consumer Electronics, 2003. ICCE. 2003 Page(s): 246 – 247
    [28]Shivajit Mohapatra, Radu Cornea, Nikil Dutt, Alex Nicolau, and Nalini Venkatasubramanian, "Integrated power management for video streaming to mobile handheld devices," in Proc. of the 11th annual ACM international conference on Multimedia, Nov. 2003. Pages 582-591
    [29]K.-Y. Hsieh , Y.-C. Liu, P.-W. Wu, S.-W. Chang, J. K. Lee, “Enabling Streaming Remoting on Embedded Dual-core Processors,” Proc. of IEEE International Conference on Parallel Processing, 2008. Page(s): 35 - 42
    [30]Lin Zhong, Bin Wei, and Michael J. Sinclair, “SMERT: Energy-Efficient Design of a Multimedia Messaging System for Mobile Devices”, in Proc.of the 43rd annual conference on Design Automation, July 2006. Page(s): 586 – 591
    [31]Dong Guk Sun, Sung Jo Kim, "A Kernel-Level RTP for Efficient Support of Multimedia Service on Embedded Systems," ICCSA 2005, LNCS 3482, pp. 79–88, 2005.
    [32]University College London, ”UCL Common Multimedia Library”, http://www-ice.cs.ucl.ac.uk/multimedia/software/common/index.html.
    [33]Nick Feamster, “The SR-RTP Library/Toolkit,” http://nms.lcs.mit.edu/software/videocm
    [34]Arne Kepp, “jlibRTP – The Java RTP Library.” http://jlibrtp.org/
    [35]JianJun Liu, Xuebin Ruan, The implementation of video monitoring system based on JRTPLIB, Computer and digital engineering, Vol. 39 No. 4186, 2011
    [36]Qiu-Yun Zheng ; Tao Li , “Design and realization of video streaming real-time transmission system based on IPTV terminal,” IEEE 23rd International Conference on Electronic and Mechanical Engineering and Information Technology (EMEIT), 2011 Volume: 4. Page(s): 1723 – 1726
    [37]George Teodoro, Timothy D.R. Hartley, “Optimizing dataflow applications on heterogeneous environments”, in Cluster Comput (2012) 15:pp.125–144
    [38]Le, R. ; Mundy, J.L. ; Bahar, R.I. , “High Performance Parallel JPEG2000 StreamingDecoder Using GPGPU-CPU Heterogeneous System,” IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors (ASAP),. Page(s): 16 - 23 , 2012
    [39]Yu-Hsien Lin Chiaheng Tu Chi-Sheng Shih Shih-Hao Hung, “Zero-Buffer Inter-Core Process Communication Protocol for Heterogeneous Multi-core Platforms,” IEEE International Conference on Embedded and Real-Time Computing Systems and Applications. Page(s): 69 - 78 , 2009
    [40]Xu CHEN, Ji-hong ZHANG, Wei LIU, Yong-sheng LIANG and Ji-qiang FENG, “H.264/SVC parameter optimization based on quantization parameter, MGS fragmentation, and user bandwidth distribution,” EURASIP Journal on Advances in Signal Processing January 2013, 2013:10
    [41]http://research.edm.uhasselt.be/~jori/jrtplib/documentation/
    [42]http://research.edm.uhasselt.be/~jori/jrtplib/documentation/ https://books.google.com.tw/books?id=sdG-1hN_4TYC&pg=PA605&lpg=PA605&dq=Speed+up+Average+Search+Points&source=bl&ots=eM02_DI1dP&sig=G3I8hAH_LT35IxX6eidHuJGSd0s&hl=zh-TW&sa=X&ei=IhG-VJj5F4i48gXs9YDYCw&ved=0CBwQ6AEwAA#v=onepage&q=Speed%20up&f=false
    [43]zh.wikipedia.org/wiki/峰值信噪比
    [44]T. Koga, K. Iinuma, A. Hirano, Y. Iijima and T. Ishiguro, “Motion- compensated interframe coding for video conferencing,” in Proc. of NTC81, New Orleans, LA, pp. G5.3.1-G..5.3.5., 1981.
    [45]Yun-chiu Lai “A MEAT (Motion Estimation of Adaptive Tradeoff) Fast Motion Estimation Algorithm for H.264/AVC” NTUST Master thesis (2007)
    [46]Imen Werda, Taheni Dammak, Thierry Grandpierre, Mohamed Ali Ben Ayed, Nouri Masmoudi, “Real-time H.264/AVC baseline decoder implementation on TMS320C6416,” Journal of Real-Time Image Processing, December 2012, Volume 7, Issue 4, pp 215-232.

    無法下載圖示 全文公開日期 2020/02/04 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE