簡易檢索 / 詳目顯示

研究生: 文為弘
Wei-hung Wen
論文名稱: ARM Cortex-A8 NEON架構研究與實作-以TI OMAP3530為例
ARM Cortex-A8 NEON Architecture Study and Implementation- Based on TI OMAP3530
指導教授: 許孟超
Mon-Chau Shie
口試委員: 阮聖彰
Shanq-Jang Ruan
吳晉賢
Chin-Hsien Wu
林昌鴻
Chang-Hong Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2011
畢業學年度: 100
語文別: 中文
論文頁數: 67
中文關鍵詞: 嵌入式系統ARM Cortex-A8ARMv7增強型SIMDNEON架構
外文關鍵詞: Embeded System, ARM Cortex-A8, ARMv7, Advanced SIMD, NEON
相關次數: 點閱:222下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

目前多媒體影音的應用已成為各種運算平台的主要類型,伴隨著多媒體影音資料內容的多樣性與複雜性,對於硬體規格的要求也越來越高。由於嵌入式系統不同於通用式PC環境具有豐富的硬體資源,嵌入式系統考量的是高效能、低成本及低耗電…等等條件,因此如何在有限的資源下提升多媒體影音資料處理的效能是嵌入式系統設計所需要面對的課題。

針對此需求,ARM公司在其ARMv7指令集架構中納入了增強型SIMD架構,也稱為NEON多媒體處理引擎技術。NEON技術是在ARM處理器架構中相對於先前幾代產品中很有競爭力的一種技術。一般而言,影音資料大多是以固定長度的資料型態出現,例如:影像的像素資料其顏色分量大多是以8bit為單位,而聲音資料大多是16bit。如此,對於32bit的ARM CPU而言,影音資料的處理是比較沒有效率的,因此衍生出NEON多媒體處理引擎技術用以輔助資料的處理。NEON技術的特點是支持最大128bit向量資料平行運算,實現SIMD技術特徵。

本論文以系統整體的觀點分析與探討如何透過NEON指令集的使用來增加系統效能與進行優化,並且提供基於ARM Cortex-A8核心的TI OMAP353平台應用NEON技術的H.264/AVC軟體解碼器設計方案。實驗過程中我們採用ARM-Linux作為嵌入式作業系統,實際在系統上面比較NEON技術對於color space的轉換效能與代碼分析,同時也移植與優化基於NEON指令架構的MPlayer作為H.264影片播放平台。根據我們的實驗結果,TI OMAP3530採用NEON指令集應用在大小為640x480、顏色深度為24bits的RGB-to-Gray的轉換上,約可以減少約23.5%的資料時間花費。另外針對五種不同解析度的H.264影片的播放實驗中,統計上平均也可以減少約11%~13%的解碼時間花費。在開啟NEON對於系統整體的耗電測量實驗方面,我們得到對於系統的最大消耗功率僅比未開啟NEON時增加約3.87%的功率消耗。


Currently, multimedia applications have become the main part of every computing platform. Coupled with the growing diversity and complexity of multimedia informational content, the requirements for the hardware specifications are increasing as well. The presence of the difference between an embedded system and the common PC environment with its abundant hardware resources is evident to the extent that the embedded systems are regarded as high-performance, yet low cost and low power consumption solutions, etc. As the result of that, the main question that arises is how to enhance the performance of multimedia data processing while designing an embedded system with the limited availability of resources.
Since the requirements for making a practical use of embedded systems are growing, the ARM Corporation provides the inclusion of enhanced SIMD architecture in the ARMv7 instruction set. This architecture is also known as NEON multimedia processing engine. The NEON technology is created to be extremely competitive in contrast to the previous generations of ARM products. In general, audio and video data are appeared mostly to be based on specific fixed data types, for example: data of an image pixel are mostly based on 8bit color components, while the audio is placed on a 16bit level. In respect to a 32bit ARM CPU, the processing of audio and video data is relatively inefficient, and, therefore, for a support of the data processing, the NEON multimedia engine is applied. It is worth mentioning that the NEON technology supports the 128bit data vector in order to provide a good sustainability during a parallel computing and at the same time implements the features of the SIMD technology.
This thesis provides an analysis and discussion about the overall points of view regarding the process of how the NEON instruction set can be used in order to improve a system performance and optimization, also how to implement a NEON technology application platform in designing H.264/AVC software decoder that uses an application processor TI OMAP3530 based on Cortex-A8 core. In our experiment, we use ARM-Linux as an embedded operating system; we compare RGB-to-Gray color space conversion performance and H.264/AVC video playback with the NEON technology.
According to the experimental results, the processor TI OMAP3530 with the NEON instruction providing the resolution of 640x480 and the color depth of 24bits of RGB-to-Gray conversion, is able to reduce approximately 23.5% of processing time cost, and for the other 5 different resolutions, H.264 video playback can also force a considerable reduction up to 11%~13% of decoding time cost in average. In the system’s overall power consumption measurement, enabling NEON technology merely results in the 3.87% increase of maximum power consumption.

摘要I AbstractII 致謝III 目錄IV 圖目錄VI 表目錄VIII 第一章緒論1 1.1研究動機1 1.2相關技術與研究2 1.3研究方法4 1.4論文架構5 第二章相關知識6 2.1ARM 處理器核心分類與其架構6 2.2NEON架構分析11 2.2.1SIMD (Single Instruction Multiple Data)11 2.2.2NEON架構與指令集13 2.2.3NEON vs. VFP16 2.3ARM-Linux嵌入式系統開發流程17 2.4H.264/AVC視訊編解碼標準19 第三章系統設計與實作23 3.1硬體環境23 3.1.1Devkit8000開發板23 3.1.2TI OMAP353027 3.1.3視訊輸出介面34 3.1.4音訊輸出介面35 3.2軟體環境36 3.2.1交叉編譯(Cross-Complier)開發環境的建立36 3.2.2NFS開發環境的建立37 3.2.3Boot-Loader39 3.2.4Linux Kernel定制41 3.3NEON指令集的編/組譯43 3.4MPlayer移植過程44 3.5H.264/AVC影像解碼流程分析46 第四章實驗結果與分析49 4.1NEON架構軟體代碼優化分析49 4.2NEON架構播放效能分析53 4.3NEON架構系統耗電分析57 第五章結論與未來研究方向63 參考文獻65 作者簡介67

[1]郭宗勝、謝瑛之、曲建仲,“雙核心嵌入式系統開發-DaVinci SOC平台架構及實作演練”,全華出版社,2010
[2]蘇暉凱,“ARM Linux核心嵌入式系統開發指南”,全華出版社,2009
[3]Iain E. Richardson , “H.264 Advanced Video Compression Standard,2nd ed. ”,John Wiley & Sons, Ltd, 2010
[4]王佳鴻,“以TI DM6446 DSP實現且最佳化H.264之移動向量估計”,國立臺灣科技大學電子工程學系碩士論文,2007
[5]彭祥熙,“H.264隨身資訊播放系統於SoC晶片之實現”,國立成功大學電腦與通信工程研究所碩士學位論文, 2006
[6]楊士萱、陳柏源,“H.264/AVC技術與應用簡介”,影像與識別 2007
[7]Im Yong Lee, Il-Hyun Park, and Dong-Wook Lee, “Implementation of the H.264/AVC Decoder Using the Nios II Processor”, Seoul National University 2005
[8]ThomasWiegand, Gary J. Sullivan, Gisle Bjøntegaard, and Ajay Luthra, “Overview of the H.264/AVC Video Coding Standard”, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 7, JULY 2003
[9]Tero Rintaluoma, Olli Silv´en, “SIMD performance in software based mobile video coding ”, Embedded Computer Systems (SAMOS), 2010 International Conference on IEEE, JULY 2010
[10]Pujara, C., Modi, A., Sandeep, G., Inamdar, S., Kolavil, D., Tholath, V., “H.264 Video Decoder Optimization on ARM Cortex-A8 with NEON ”, India Conference (INDICON), 2009 Annual IEEE, Dec. 2009
[11]鄭慕德,“ARM 嵌入式系統開發與應用”,全華出版社,2005
[12]Joško Rokov, dipl.ing. ,“ARM Architecture and Multimedia Applications”, RIZ-Transmitters Co.,2010
[13]ARM Ltd.,Introducing NEON™ Development Article,http://www.arm.com/,2009
[14]David Williamson,“ARM Cortex A8: A High Performance Processor for Low Power Applications”,ARM Ltd.,2009
[15]ARM Ltd.,ARM NEON support in the ARM compiler,http://www.arm.com/,2008
[16]ARM Ltd.,ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition,http://www.arm.com/,2007
[17]ARM Ltd.,Cortex-A8 Revision: r3p2 Technical Reference Manual,http://www.arm.com/,2010
[18]David A. Patterson & John L. Hennessy, “ Computer Organization and Design, Third Edition: The Hardware/Software Interface, Fourth Edition”, MORGAN KAUFMANN PUBLISHERS, 2009
[19]Timll Technic Inc.,Devkit8000 Schematic & Userguide,http://www.timll.com/
[20]Texas Instruments Incorporated,OMAP3530/25 Applications Processor datasheet (SPRS507F),http://www.ti.com,2010
[21]Texas Instruments Incorporated,OMAP35x Applications Processor Technical Reference Manual (SPRUF98T),http://www.ti.com,2010
[22]OMAP35x Linux PSP Datasheet (sprs640),http://www.ti.com,2010
[23]陳協成,“嵌入式視訊擷取之實務與應用”,國立成功大學工程科學系碩士論文,2005
[24]謝文程,“嵌入式系統之U-Boot移植與實作”,國立台灣科技大學資訊工程學系碩士學位論文,2004
[25]The MPlayer Team,The MPlayer Manual Document,http://www.mplayerhq.hu/DOCS/man/en/mplayer.1.html

QR CODE