簡易檢索 / 詳目顯示

研究生: 吳亭萱
Ting-Xuan Wu
論文名稱: 基於LSTM/GRU結構循環神經網路圖像壓縮架構
Image Compression Using LSTM/GRU Based Recurrent Convolution Neural Network
指導教授: 陳建中
Jiann-Jone Chen
口試委員: 鍾國亮
Kuo-Liang Chung
吳怡樂
Yi-Leh Wu
杭學鳴
Hsueh-Ming Hang
花凱龍
Kai-Lung Hua
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 65
中文關鍵詞: 深度學習圖像壓縮卷積神經網路循環神經網路LSTM記憶單元GRU控制單元
外文關鍵詞: Deep Learning, Image Compression, CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), LSTM (Long Short Term Memory Network), GRU (Gated Recurrent Unit)
相關次數: 點閱:944下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

隨著網際網路的發展,網路上圖像數據的數量也越來越多,而用戶對網頁加載的速度要求也越來越高,為了滿足用戶對網頁加載的快速性與舒適性的服務需求,如何將圖像以更低的位元數(bits)保存,並且給用戶預覽一個低解析度的快取縮圖(Thumbnails)變得越來越重要。目前有很多的標準圖像壓縮算法,如JPEG、JPEG2000都有很不錯的表現,其中JPEG採用的是以離散餘弦轉換(Discrete Cosine Transform)為主的區塊編碼方式;而JPEG2000則改用以小波轉換(Wavelet Transform)為主的多解析編碼方式。因為標準壓縮算法普遍專注於大圖像的壓縮,但現在的網路中有數量十分龐大的快取縮圖,提高快取縮圖的壓縮技術便能顯著提高用戶在低頻寬的網路連接時的上網體驗,如今可以透過深度學習技術來設計壓縮算法,目標是學習一個比離散餘弦轉換或小波轉換更好的轉換,使其能提高圖像壓縮比並且能最大程度地保存原始圖像資訊。
本篇論文提出一自編碼器的網路架構,模型由一個多層循環卷積神經網路(Multi-layer Recurrent Convolutional Neural Network)組成,包含兩個子模型:編碼器(Encoder)和解碼器 (Decoder),並在每一個節點都使用了長短期記憶網路(Long Short-Term Memory Network, LSTM)或GRU門控結構(Gated Recurrent Units, GRU),使其允許跨時間的資訊傳遞。在訓練時,編碼器(Encoder)和解碼器(Decoder)會在單一模型中一起訓練,其目標為在壓縮及解壓縮過程中保存最多的特徵資訊;在測試時,編碼器和解碼器會分開測試,和其他圖像壓縮算法相同做編碼/解碼模塊進行實驗。實驗結果顯示,本篇論文提出的網路架構,在低壓縮比(BPP>0.8)時,其峰值信噪比(Peak Signal to Noise Ratio, PSNR)與多層級結構相似性指標(Multiscale Structural Similarity, MS-SSIM)效果能比JPEG或其他壓縮算法好或程度相當;而高壓縮比(BPP<0.8)時,其多層級結構相似性指標效果更勝於JPEG或其他壓縮算法,在BPP為0.125時,其MS-SSIM達到0.8941;在BPP為0.875時,其MS-SSIM達到0.9814,可以產生具有銳利邊緣、豐富紋理等更好的視覺效果。


Performances of Image/video encoding methods, such as robustness to error attack and compression ratio, are important for multimedia communication applications. The JPEG image compression standard adopts block-based discrete cosine transform (DCT) and the JPEG2000 utilize wavelet transform (WT) to provide multi-resolution compression. Both standards are efficient for current multimedia communication standards. In this research, we study how to utilize deep learning methods to compress image signals, which can provide comparable performances with DCT-based JPEG and WT-based JPEG2000.
We proposed an auto encoder architecture for image compression based on a multi-layer recurrent convolutional neural network that comprises an encoder and a decoder sub models. For network nodes, we use long short-term memory network (LSTM) or GRU (Gated Recurrent Units) to enable efficient information delivery. In the training phase, both encoding and decoding procedures are trained together to reserve the most image feature information during compression and decompression. . In the testing phase, it executes encoding and decoding procedures separately to verify performances. Experiments showed that when BPP > 0.8, the PSNR and MS-SSIM (Multiscale Structure Similarity) performances of the proposed methods are better than those of JPEG. For high compression BPP < 0.8, the proposed method outperforms JPEG and others in MS-SSIM, which demonstrates better visual perception performance, i.e., sharper edge, rich textures and fewer artifacts.

ABSTRACT 致謝 目錄 圖目錄 表目錄 第一章 緒論 1.1 研究背景 1.2 動機與目的 1.3 方法概述 1.4 論文架構 第二章 背景知識與相關論文 2.1 標準圖像壓縮算法 2.2 深度學習框架 2.3 深度學習相關知識 2.4 基於深度學習之圖像壓縮相關文獻 第三章 提出方法 3.1 圖像壓縮網路架構 3.2 模型設計 3.3 圖像壓縮流程 第四章 實驗結果 4.1 實驗環境與資料庫 4.2 效能指標與比較文獻 4.3 參數配置 4.4 驗證比較效能 第五章 結論與未來展望 5.1 結論 5.2 未來展望 參考文獻

[1] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural
computation 9.8 , pp. 1735-1780, 1997.
[2] Toderici, George, et al. "Variable rate image compression with recurrent neural networks." arXiv preprint arXiv:1511.06085 , 2015.
[3] Toderici, George, et al. "Full resolution image compression with recurrent neural networks." Computer Vision and Pattern Recognition (CVPR), IEEE Conference on. IEEE, 2017.
[4] Taubman, David. "High performance scalable image compression with EBCOT." IEEE Transactions on image processing 9.7, pp.1158-1170, 2000.
[5] Courbariaux, Matthieu, Yoshua Bengio, and Jean-Pierre David. "Binaryconnect:
Training deep neural networks with binary weights during propagations." Advances in neural information processing systems, 2015.
[6] Li, Mu, et al. "Learning convolutional networks for content-weighted image
compression." arXiv preprint arXiv:1703.10553 , 2017.
[7] Krizhevsky, Alex, and Geoffrey E. Hinton. "Using very deep autoencoders for contentbased image retrieval." ESANN. 2011.
[8] Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555 , 2014.
[9] Wang, Zhou, Eero P. Simoncelli, and Alan C. Bovik. "Multiscale structural similarity for image quality assessment." Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on. Vol. 2. Ieee, 2003.
[10] Wallace, Gregory K. "The JPEG still picture compression standard." IEEE transactions on consumer electronics 38.1, pp. xviii-xxxiv, 1992.
[11] Skodras, Athanassios, Charilaos Christopoulos, and Touradj Ebrahimi. "The JPEG 2000 still image compression standard." IEEE Signal processing magazine 18.5, pp.36-58, 2001.
[12] Ballé, Johannes, Valero Laparra, and Eero P. Simoncelli. "End-to-end optimized image compression." arXiv preprint arXiv:1611.01704 , 2016.
[13] Pytorch documentation, https://pytorch.org/docs/stable/index.html#
[14] Rosenblatt, Frank. Principles of neurodynamics. perceptrons and the theory of brain mechanisms. No. VG-1196-G-8. CORNELL AERONAUTICAL LAB INC BUFFALO
NY, 1961.
[15] Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. "Learning
representations by back-propagating errors." nature323.6088, pp. 533, 1986.
[16] Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." science 313.5786 , pp. 504-507, 2006.
[17] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
[18] Pytorch documentation, "torch.optim" https://ptorch.com/news/54.html
[19] Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/
[20] Zeiler, Matthew D., et al. "Deconvolutional networks." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010.
[21] Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." European
conference on computer vision. Springer, Cham, 2014.
[22] COCO dataset, http://cocodataset.org/#home
[23] Back Propagation, BP , https://www.zybuluo.com/hanbingtao/note/476663
[24] Tensorflow Github, tensorflow/models/research/compression/, https://github.com/tensorflow/models/tree/master/research/compression
[25] Li, Mu, et al., compression framework, http://www2.comp.polyu.edu.hk/~15903062r/content-weighted-image-compression.html

QR CODE