簡易檢索 / 詳目顯示

研究生: 林德盛
De-Sheng Lin
論文名稱: 以NoC架構為基礎的快速傅立葉轉換系統設計
The Design of a NoC-Based Fast Fourier Transform System
指導教授: 林銘波
Ming-Bo Lin
口試委員: 陳郁堂
Yie-Tarng Chen
楊兆華
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 86
中文關鍵詞: 晶片網路二維網格蟲洞路由器快速傅立葉轉換CORDIC
外文關鍵詞: NoC, Fast Fourier Transform, CORDIC
相關次數: 點閱:299下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於半導體製程的快速演進,使的單一晶片可以容納更多的矽智財。然而,傳統匯流排(bus) 無法應付多核心系統大量傳輸的需求。本論文提出一個適用於快速傅立葉轉換的晶片網路(Network-on-Chip) 硬體架構,試圖以平行運算的方式提升運算效能。本論文的硬體架構包含前處理器、路由、網路介面和處理單元。路由器的設計採用蟲洞(wormhole) 的封包交換技術,透過管線化的設計使封包傳輸更有效。封包的路徑規劃則以XY 路由演算法實現,具有低成本與低延遲的特性。針對快速傅立葉轉換所需的資料次序相依性,提出資料排序及分配的方法,增加網路的負載平衡、縮短交換路徑和增加運算的效率。處理單元採用Radix-22 及CORDIC 取代傳統複數乘法。除此之外,建立在二維網格(2D mesh)拓撲上的路由器可以透過參數化的設計實現各種大小的晶片網路。
    我們在Xilinx Virtex 6 FPGA 和TSMC 0.18 μm 元件庫上實現具有256 點運算能力的2 × 2、4 × 4 和8 × 8的晶片網路。在FPGA 部分,實現的硬體使用由Matlab 產生的波形資料進行測試,並分析不同網路大小運算16 、64 及256 點的效能。在0.18 μm元件庫的部分,工作頻率皆可達到66 MHz,其中運算256點的8×8晶片網路,晶片面積為11,302.26 μm × 11,300.16 μm,等效閘數為8,099,544,功率消耗為2693.4 mW。


    With the advent of semiconductor technology, a large number of IP blocks can be integrated into a single chip. Unfortunately, in such a multicore SoC system, the bus-based architecture may become the bottleneck of system performance. To overcome this, the NoC-based architecture has been proposed in the past decade.
    To demonstrate this architecture, a NoC-based architecture for implementing Fast Fourier Transform (FFT) is proposed in this thesis. The architecture consists of preprocessors, routers, network interfaces, and processing elements. To make the router efficient, pipeline technique and wormhole switching are used with the XY routing algorithm. In addition, a method of data scheduling and distributing is developed to increase the load balancing and efficiency of packet distribution of the data sequence requirement of the FFT. The processing element adopts the Radix-22 algorithm and uses CORDIC to replace the traditional complex multiplier. Moreover, the meshbased network is fully parameterized to make it flexible.
    Various sizes, including 2 × 2, 4 × 4, and 8 × 8, with 256 points capability, of the NoC-based architecture have been implemented and verified on a Xilinx Virtex- 6 device and the TSMC 0.18-μm cell library. The core area of the 8 × 8 NoC architecture is 11,302.26 μm × 11,300.16 μm, equivalent to 8,099,544 gates. The average power consumption is 2693.4 mW at the operating frequency of 66 MHz.

    第一章序論 1.1 研究動機 1.2 研究方向 1.3 章節安排 第二章晶片網路的背景知識 2.1 匯流排與晶片網路 2.1.1 匯流排 2.1.2 晶片網路 2.2 網路拓撲 2.3 流控方法 2.3.1 線路交換 2.3.2 封包交換 2.3.3 儲存轉送 2.3.4 虛擬直通 2.3.5 蟲洞 2.3.6 虛擬通道 2.4 路由演算法 2.4.1 死鎖問題 2.4.2 非適應性演算法 2.4.3 適應性演算法 第三章快速傅立葉演算法 3.1 簡介 3.2 雙轉子因子(Twiddle Factor) 3.3 DIF Radix-2演算法 3.4 DIF Radix-4演算法 3.5 DIF Radix-22 演算法 3.6 DIF Radix-2/4 演算法 3.7 演算法比較 第四章FFT基礎架構 4.1 簡介 4.2 單一路徑回授延遲架構 4.2.1 Radix-2 SDF架構 4.2.2 Radix-4 SDF架構 4.2.3 Radix-22 SDF架構 4.3 多路徑延遲架構 4.3.1 Radix-2 MDC架構 4.3.2 Radix-4 MDC架構 4.3.3 Radix-22 MDC架構 4.4 管線式架構的比較 第五章基於晶片網路的快速傅立葉系統設計 5.1 系統架構 5.2 系統控制器 5.3 資料排序器與記憶體 5.4 封包分配器與回存器 5.4.1 資料與網路的分配關係 5.4.2 資料抓取策略 5.4.3 回存策略 5.5 網路架構 5.6 路由器的設計 5.6.1 輸入單元 5.6.2 分配器 5.6.3 Crossbar 5.6.4 路由器的管線化設計 5.7 處理單元 5.7.1 封包格式 5.7.2 處理單元間的資料交換 5.7.3 網路介面 5.7.4 儲存單元 5.7.5 仲裁單元 5.7.6 FFT處理單元 第六章實驗結果與效能評估 6.1 實驗架構 6.2 FPGA效能評估 6.3 研究比較 第七章晶片實現 7.1 元件庫的實現 7.1.1 元件庫合成與效能分析 7.1.2 自動化佈局 第八章結論

    [1] N. Prasad, I. Chakrabarti, and S. Chattopadhyay, “NoC Based Multiplier-Less
    Constant Geometry FFT Architecture,” 2014 Fourth International Confer-
    ence of Emerging Applications of Information Technology (EAIT), pp. 173–178,
    2014.
    [2] L. Benini and G. De Micheli, “Networks on Chips: A New SoC Paradigm,”
    Computer, vol. 35, pp. 70–78, Jan. 2002.
    [3] S. Barua, R. Thulasiram, and P. Thulasiraman, “Improving Data Locality in
    Parallel Fast Fourier Transform Algorithm for Pricing Financial Derivatives,”
    18th International Parallel and Distributed Processing Symposium, pp. 26–30,
    2004.
    [4] Z. Cui-xiang, H. Guo-qiang, and H. Ming-he, “Some New Parallel Fast Fourier
    Transform Algorithms,” Sixth International Conference on Parallel and Dis-
    tributed Computing, Applications and Technologies, pp. 624–628, 2005.
    [5] J. H. Bahn, J. Yang, and N. Bagherzadeh, “Parallel FFT Algorithms on
    Network-on-Chips,” Fifth International Conference on ITNG, pp. 1087–1093,
    2008.
    [6] M. Jiang, B. Yang, and R. H. et al., “Multiplierless Fast Fourier Transform
    Architecture,” Electronics Letters, vol. 43, pp. 191–192, 2007.
    [7] Y. Jung, H. Yoon, and J. Kim, “New Efficient FFT Algorithm and Pipeline
    Implementation Results for OFDM/DMT Applications,” IEEE Transactions
    on Consumer Electronics, vol. 49, pp. 14–20, 2003.
    [8] D. Bidet, E.and Castelain, C. Joanblanq, and P. Senn, “A Fast Single-Chip
    Implementation of 8192 Complex Point FFT,” IEEE Joural of Solid-State Cir-
    cuits, vol. 20, pp. 300–305, 1995.
    [9] R. Jiang, “An Area-Efficient FFT Architecture for OFDM Digital Video Broadcasting,”
    IEEE Transactions on Consumer Electronics, vol. 53, pp. 1322–1326,
    2007.
    [10] C.-H. Lin and A.-Y. Wu, “Mixed-Scaling-Rotation CORDIC (MSR-CORDIC)
    Algorithm and Architecture for High-Performance Vector Rotational DSP Applications,”
    IEEE Transactions On Circuits and Systems- I (TCAS-I), vol. 52,
    pp. 2385–2396, 2005.
    [11] C.-S. Wu and A.-Y. Wu, “Modified Vector Rotational CORDIC (MVRCORDIC)
    Algorithm and Architecture,” IEEE Transactions On Circuits and
    Systems-II (TCAS-II): Analog and Digital Signal Processing, vol. 48, pp. 548–
    561, 2001.
    [12] P. K. Meher, J. Valls, and T.-B. J. et al., “50 Years of CORDIC: Algorithms,
    Architectures, and Applications,” IEEE Transactions on Circuits and Systems
    I: Regular Papers, vol. 56, pp. 1893–1907, 2009.
    [13] O. Sarbishei and K. Radecka, “On the Fixed-Point Accuracy Analysis and Optimization
    of FFT Units with CORDIC Multipliers,” IEEE Symposium on Com-
    puter Arithmetic (ARITH), pp. 25–27, 2011.
    [14] S. Abdullah, H. Nam, M. McDermot, and J. Abraham, “A High Throughput
    FFT Processor with No Multipliers,” IEEE International Conference on Com-
    puter Design, pp. 485–490, 2009.
    [15] R.-X. Gong, J.-Q. Wei, D. Sun, L.-L. Xie, P.-F. Shu, and X.-B. Meng, “FPGA
    Implementation of A CORDIC-Based Radix-4 FFT Processor for Real-Time
    Harmonic Analyzer,” 2011 Seventh International Conference on Natural Com-
    putation (ICNC), vol. 4, pp. 1832–1835, 2011.
    [16] G. Zhang and F. Chen, “Parallel FFT with CORDIC for Ultra Wide Band,”
    IEEE International Symposium on Personal, Indoor and Mobile Radio Com-
    munications, vol. 2, pp. 1173–1177, 2004.
    [17] H. G. Lee, N. Chang, and U. Y. Ogras et al., “On-chip communication architecture
    exploration: a quantitative evaluation of point-to-point, bus, and networkon-
    chip approaches,” ACM Transactions on Design Automation of Electronic
    Systems (TODAES), vol. 12, pp. 1–20, Aug. 2008.
    [18] J. Duato, S.Yalamanchili, and L. Ni, Interconnection Networks: An Engineering
    Approach. Morgan Kaufmann Publishers, 2003.
    [19] W. Dally and B. Towles, Principles and Practices of Interconnection Networks.
    Morgan Kaufmann Publishers, 2005.
    [20] S. Scott, D. Abts, and J. Kim et al., “The BlackWidow High-Radix Close Network,”
    Proceedings of the 33rd International Symposium on Computer Archi-
    tecture, pp. 16–28, Jul. 2006.
    [21] E. Salminen, A. Kulmala, and T. D. Hmlinen, “Survey of Network-on-chip Proposals.”
    http://www.ocpip.org, Mar. 2008.
    [22] W. J. Dally, “Virtual-channel Flow Control,” IEEE Transactions on Parallel
    and Distributed Systems, vol. 3, pp. 194–205, Mar. 1992.
    [23] L. G. Valiant and G. J. Brebner, “Universal Schemes for Parallel Communication,”
    Proceedings of the 13th Annual ACM Symposium on Theory of Comput-
    ing (STOC'81), pp. 263–277, 1981.
    [24] T. Nesson and S. L. Johnsson, “ROM Routing on Mesh and Torus Networks,”
    Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and
    Architectures (SPAA'95), pp. 275–287, Jul. 1995.
    [25] C. Glass and L. Ni, “The Turn Model for Adaptive Routing,” Proceedings of the
    19th Annual International Symposium on Computer Architecture, pp. 278–287,
    May 1992.
    [26] G.-M. Chiu, “The Odd-even Turn Model for Adaptive Routing,” IEEE Trans-
    actions on Parallel and Distributed Systems, vol. 11, pp. 729–738, Jul. 2000.
    [27] C. Nicopoulos, D. Park, and J. Kim et al., “ViChaR: A Dynamic Virtual Channel
    Regulator for Network-on-chip Routers,” Proceedings of Annual IEEE/ACM
    International Symposium on Microarchitecture, pp. 333–346, Dec. 2006.
    [28] J. Hu and R. Marculescu, “DyAD - Smart Routing for Networks-on-chip,” Pro-
    ceedings of the 41st Annual Conference on Design Automation, pp. 260–263,
    Jul. 2004.
    [29] J. E. Volder, “The CORDIC Trigonometric Computing Technique,” IRE Trans-
    actions on Electronic Computers, vol. issue:3, pp. 330–334, 1959.

    QR CODE