簡易檢索 / 詳目顯示

研究生: 郭倍誠
Bei-Cheng Guo
論文名稱: 基於兩階段生成對抗網路結構之單一影像高動態範圍重建
Single-Image HDR Reconstruction Based on Two-stage GAN Structure
指導教授: 林昌鴻
Chang-Hong Lin
口試委員: 林淵翔
Yuan-Hsiang Lin
吳晉賢
Chin-Hsien Wu
林敬舜
Ching-Shun Lin
林昌鴻
Chang-Hong Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 73
中文關鍵詞: 單張影像高動態範圍重建影像融合卷積神經網路生成對抗網路深度學習
外文關鍵詞: Single-Image HDR Reconstruction, Image Fusion, Convolutional Neural Network (CNN), Generative Adversarial Network (GAN), Deep Learning
相關次數: 點閱:380下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 真實世界場景的亮度範圍屬於高動態範圍 (HDR)。然而,由於硬體上的限制,大部分的數位相機只能擷取到有限的亮度範圍,這會導致拍攝出低動態範圍 (LDR)的影像。因為人眼可以捕捉到相當廣的亮度範圍,所以高動態範圍影像重建任務的目標是將低動態範圍的影像擴展回人眼實際看到的高動態範圍影像。從一張低動態範圍影像重建高動態範圍影像是非常具有挑戰性的,因為低動態範圍影像中曝光不足和過度曝光區域的細節已經丟失了。在本論文中,我們提出了一個新穎的兩階段模型來解決這個問題。第一階段模型利用生成對抗網絡 (GAN) 以及注意力機制的強大效能還原在低動態範圍影像曝光不足和過度曝光區域中不見的細節。第二階段是一個多分支的多卷積神經網絡 (CNN),將第一階段產生的多張不同曝光度低動態範圍影像融合為一張高動態範圍影像。最後,我們使用聯合學習 (joint learning) 將整個模型調整至全域最佳化。在量化分數的比較結果顯示我們的方法總體上比目前現有方法得到了更高的分數。另外,在成像品質上的比較結果也顯示我們的方法可以產生出平滑且無雜訊的高動態範圍影像。


    The luminance of the real-world scenes falls in a high dynamic range (HDR). However, most digital cameras can only capture a limited range of luminance due to the hardware constraints, which produce low dynamic range (LDR) images. Because human eyes can capture a wide range of luminance, the goal of the HDR reconstruction is to expand the LDR image to an HDR image that we actually see. It is challenging to recover an HDR image from a single LDR image due to the missing information in under-/over-exposed regions. In this thesis, we proposed a novel two-stage model to settle this problem. The first stage model takes the powerful property of the generative adversarial network (GAN) and the attention mechanism to generate the missing information in under-/over-exposed areas. The second stage is a multi-branch convolutional neural network (CNN) to fuse the multiple different exposure LDR images from the first stage to generate an HDR image. Finally, we adopt the joint learning strategy to fine-tune the entire model to the global optimum result. The quantitative comparisons demonstrate that our method achieves higher scores than other state-of-the-art methods. Moreover, our method generates smooth and noise-free HDR images in the qualitative comparisons.

    摘要 I ABSTRACT II 致謝 III LIST OF CONTENTS IV LIST OF FIGURES VII LIST OF TABLES IX CHAPTER 1 INTRODUCTIONS 1 1.1 Motivation 1 1.2 Contributions 4 1.3 Thesis Organization 5 CHAPTER 2 RELATED WORKS 6 2.1 Multi-Image HDR Reconstruction 6 2.2 Single-Image HDR Reconstruction 8 CHAPTER 3 PROPOSED METHODS 11 3.1 Data Augmentation 13 3.1.1 Random Scaling and Random Crop 13 3.1.2 Random Flip and Random Rotation 15 3.1.3 Gamma Correction 16 3.1.4 Saturation Adjustment 17 3.1.5 Brightness Adjustment 18 3.2 Darker & Brighter Cycle conditional GAN (DBC-cGAN) 20 3.2.1 cGAN Architecture 21 3.2.2 Spatial-Channel Attention Gate Block [32] 23 3.2.3 Spatial Attention 24 3.2.4 Channel Attention 25 3.3 Image Fusion Network 27 3.3.1 Fusion Network Architecture 28 3.3.2 Convolutional Residual Block [38] 28 3.4 Loss Function 30 3.4.1 Reconstruction Loss 30 3.4.2 Conditional Adversarial Loss 30 3.4.3 μ-law Tone Mapping Reconstruction Loss 31 3.4.4 Joint Learning [41] 32 CHAPTER 4 EXPERIMENTAL RESULTS 33 4.1 Training Details 33 4.2 Datasets 34 4.2.1 HDR-SYNTH [6] 34 4.2.2 HDR-REAL [6] 34 4.2.3 LDR Images Classification 36 4.3 Evaluation Metrics 37 4.3.1 PSNR [54] 37 4.3.2 SSIM [55] 38 4.4 Comparisons with State-of-the-art Methods 39 4.4.1 Quantitative Comparisons 39 4.4.2 Qualitative Comparisons 41 4.5 Ablation Studies 50 4.5.1 Analysis on Alternatives of the DBC-cGAN 50 4.5.2 Analysis on Alternatives of the Image Fusion Network 52 4.5.3 Analysis on Alternatives of the Entire Model 52 CHAPTER 5 CONCLUSIONS AND FUTURE WORKS 54 5.1 Conclusions 54 5.2 Future Works 55 REFERENCES 57

    [1] P. E Debevec and J. Malik, "Recovering High Dynamic Range Radiance Maps from Photographs," ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques, pp. 369-378, 1997.
    [2] M. A. Robertson, S. Borman, and R. L. Stevenson, "Dynamic Range Improvement through Multiple Exposures," International Conference on Image Processing, vol. 3, pp. 159-163, 1999.
    [3] T. Mertens, J. Kautz, and F. V. Reeth, "Exposure Fusion," IEEE Conference on Computer Graphics and Applications, pp. 382-390, 2007.
    [4] D. Marnerides, T. Bashford-Rogers, J. Hatchett, and K. Debattista, "ExpandNet: A Deep Convolutional Neural Network for High Dynamic Range Expansion from Low Dynamic Range Content," Computer Graphics Forum, vol. 37, no. 2, pp. 37-49, 2018.
    [5] G. Eilertsen, J. Kronander, G. Denes, R. Mantiuk, and J. Unger, "HDR Image Reconstruction from A Single Exposure using Deep CNNs," ACM Transactions on Graphics, vol. 36, no. 178, pp. 1-15, 2017.
    [6] Y.-L. Liu, W.-S. Lai, Y.-S. Chen, Y.-L. Kao, M.-H. Yang, Y.-Y. Chuang, and J.-B. Huang, "Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline," IEEE/CVF Computer Vision and Pattern Recognition, pp. 1651-1660, 2020.
    [7] P. Raipurkar, R. Pal, and S. Raman, "HDR-cGAN: Single LDR to HDR Image Translation using Conditional GAN," arXiv:2110.01660, 2021.
    [8] Y. Endo, Y. Kanamori, and J. Mitani, "Deep Reverse Tone Mapping using Generative Adversarial Networks," ACM Transactions on Graphics, vol. 36, no. 177, pp. 1-10, 2017.
    [9] S. Lee, G. H. An, and S. J. Kang, "Deep Recursive HDRI: Inverse Tone Mapping using Generative Adversarial Networks, " European Conference on Computer Vision, pp. 596-611, 2018.
    [10] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks," IEEE International Conference on Computer Vision, pp. 2242-2251, 2017.
    [11] R. Hartley and A. Zisserman, "Multiple View Geometry in Computer Vision," Cambrige university press, 2003.
    [12] S. Liu, P. Tan, L. Yuan, J. Sun, and B. Zeng, "MeshFlow: Minimum Latency Online Video Stabilization," European Conference on Computer Vision, pp. 800-815, 2016.
    [13] S. Baker, S. Roth, D. Scharstein, M. J. Black, J. P. Lewis, and R. Szeliski, "A Database and Evaluation Methodology for Optical Flow," International Journal of Computer Vision, vol. 92, no. 1, pp. 1-31, 2007.
    [14] K. R. Prabhakar, R. Arora, A. Swaminathan, K. P. Singh, and R. V. Babu, "A Fast, Scalable, and Reliable Deghosting Method for Extreme Exposure Fusion," IEEE International Conference on Computational Photography, pp. 1-8, 2019.
    [15] D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, "PWC-Net: CNNs for Optical Flow using Pyramid, Warping, and Cost Volume," IEEE Computer Vision and Pattern Recognition, pp. 8934-8943, 2018.
    [16] Z. Liu, W. Lin, X. Li, Q. Rao, T. Jiang, M. Han, H. Fan, J. Sun, and S. Liu, "ADNet: Attention-guided Deformable Convolutional Network for High Dynamic Range Imaging," IEEE Computer Vision and Pattern Recognition Workshop, pp. 463-470, 2021.
    [17] X. Wang, K. C. Chan, K. Yu, C. Dong, and C. C. Loy, "EDVR: Video Restoration with Enhanced Deformable Convolutional Networks," IEEE Computer Vision and Pattern Recognition Workshop, pp. 1954-1963, 2019.
    [18] S. Wu, J. Xu, Y.-W. Tai, and C.-K. Tang, "Deep High Dynamic Range Imaging with Large Foreground Motions," European Conference on Computer Vision, pp. 120-135, 2018.
    [19] Y. Niu, J. Wu, W. Liu, W. Guo, and R. W. Lau, "HDR-GAN: HDR Image Reconstruction from Multi-Exposed LDR Images with Large Motions," IEEE Transactions on Image Processing, vol. 30, pp. 3885-3896, 2021.
    [20] F. Banterle, P. Ledda, K. Debattista, and A. Chalmers, "Inverse Tone Mapping," International Conference on Computer Graphics and Interactive Techniques in Australasia and Southeast Asia, pp. 349-356, 2006.
    [21] H. Landis, "Production-Ready Global Illumination," Siggraph course notes, vol. 16, p. 11, 2002.
    [22] C. Bist, R. Cozot, G. Madec, and X. Ducloux, "Tone Expansion using Lighting Style Aesthetics," Computers & Graphics, vol. 62, pp. 77-86, 2017.
    [23] A. O. Akyüz, R. Fleming, B. E. Riecke, E. Reinhard, and H. H. Bülthoff, "Do HDR Displays Support LDR Content? A psychophysical evaluation," ACM Transactions on Graphics, vol. 26, no. 3, pp. 38-es, 2007.
    [24] Y. Huo, F. Yang, L. Dong, and V. Brost, "Physiological Inverse Tone Mapping Based on Retina Response," The Visual Computer, vol. 30, no. 5, pp. 507-517, 2014.
    [25] R. P. Kovaleski and M. M. Oliveira, "High-quality Reverse Tone Mapping for A Wide Range of Exposures," IEEE Conference on Graphics, Patterns, and Images, pp. 49-56, 2014.
    [26] M. D. Grossberg and S. K. Nayar, "Modeling the Space of Camera Response Functions," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 10, pp. 1272-1282, 2004.
    [27] Y.-W. Tai, X. Chen, S. Kim, S. J. Kim, F. Li, J. Yang, J. Yu, Y. Matsushita, and M. S. Brown, "Nonlinear Camera Response Functions and Image Deblurring: Theoretical Analysis and Practice," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 10, pp. 2498-2512, 2013.
    [28] J. Han, P. Fang, W. Li, J. Hong, M. A. Armin, I. Reid, L. Petersson, and H. Li, "You Only Cut Once: Boosting Data Augmentation with A Single Cut," arXiv:2201.12078, 2022.
    [29] A. R. Smith, "Color Gamut Transform Pairs," ACM SIGGRAPH Computer Graphics, vol. 12, no. 3, pp. 12-19, 1978.
    [30] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, "Image-to-image Translation with Conditional Adversarial Networks," IEEE Conference on computer vision and pattern recognition, pp. 1125-1134, 2017.
    [31] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional Networks for Biomedical Image Segmentation," International Conference on Medical image computing and computer-assisted intervention, pp. 234-241, 2015.
    [32] T. L. B. Khanh, D.-P. Dao, N.-H. Ho, H.-J. Yang, E.-T. Baek, G. Lee, S.-H. Kim, and S. B. Yoo, "Enhancing U-Net with Spatial-channel Attention Gate for Abnormal Tissue Segmentation in Medical Imaging," Applied Sciences, vol. 10, no. 17, pp. 5729, 2020.
    [33] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "CBAM: Convolutional Block Attention Module," European Conference on Computer Vision, pp. 3-19, 2018.
    [34] A. G. Roy, N. Navab, and C. Wachinger, "Recalibrating Fully Convolutional Networks with Spatial and Channel “Squeeze and Excitation” Blocks," IEEE Transactions on Medical Imaging, vol. 38, no. 2, pp. 540-549, 2018.
    [35] G. R. KS, A. Biswas, M. S. Patel, and B. P. Prasad, "Deep Multi-stage Learning for HDR with Large Object Motions," IEEE International Conference on Image Processing, pp. 4714-4718, 2019.
    [36] R. Sharma and E. N. Singh, "Comparative Study of Different Low Level Feature Extraction Techniques," Int J Eng Res Technol, vol. 3, no. 4, pp. 1454-1460, 2014.
    [37] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," IEEE Conference on computer vision and pattern recognition, pp. 770-778, 2016.
    [38] L. Wang, S. Wang, R. Chen, X. Qu, Y. Chen, S. Huang, C. Liu, "Nested Dilation Networks for Brain Tumor Segmentation Based on Magnetic Resonance Imaging," Frontiers in Neuroscience, vol. 13, pp. 285, 2019.
    [39] N. K. Kalantari and R. Ramamoorthi, "Deep High Dynamic Range Imaging of Dynamic Scenes," ACM Transactions on Graphics, vol. 36, no. 4, pp. 144:1-144:12, 2017.
    [40] E. Pérez-Pellitero, S. Catley-Chandar, A. Leonardis, and R. Timofte, "NTIRE 2021 Challenge on High Dynamic Range Imaging: Dataset, Methods and Results," IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 691-700, 2021.
    [41] A. Sayers, J. Heron, A. D. Smith, C. Macdonald-Wallis, M. S. Gilthorpe, F. Steele, K. Tilling, "Joint Modelling Compared with Two Stage Methods for Analysing Longitudinal Data and Prospective Outcomes: A Simulation Study of Childhood Growth and BP," Statistical methods in medical research, vol. 26, no. 1, pp. 437-452, 2017.
    [42] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, "Automatic Differentiation in Pytorch," 2017.
    [43] K. He, X. Zhang, S. Ren, and J. Sun, "Delving Deep into Rectifiers: Surpassing Human-level Performance on Imagenet Classification," IEEE International Conference on Computer Vision, pp. 1026-1034, 2015.
    [44] D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," arXiv:1412.6980, 2014.
    [45] M. D. Fairchild, "The HDR Photographic Survey," Color and Imaging Conference, Society for Imaging Science and Technology, vol. 2007, no. 1, pp. 233-238, 2007.
    [46] B. Funt and L. Shi, "The Effect of Exposure on MaxRGB Color Constancy," Human Vision and Electronic Imaging XV, vol. 7527, pp. 282-288, 2010.
    [47] B. Funt and L. Shi, "The Rehabilitation of MaxRGB," Color and Imaging Conference, Society for Imaging Science and Technology, vol. 2010, no. 1, pp. 256-259, 2010.
    [48] E. Reinhard, W. Heidrich, P. Debevec, S. Pattanaik, G. Ward, and K. Myszkowski, "High Dynamic Range Imaging: Acquisition, Display, and Image-based Lighting," Morgan Kaufmann, 2010.
    [49] G. Ward, "High Dynamic Range Image Encodings," 2006.
    [50] F. Xiao, J. M. DiCarlo, P. B. Catrysse, and B. A. Wandell, "High Dynamic Range Imaging of Natural Scenes," Color and Imaging Conference, Society for Imaging Science and Technology, vol. 2002, no. 1, pp. 337-342, 2002.
    [51] R. Mantiuk, "Pfstools HDR gallery," http://pfstools.sourceforge.net/hdr_gallery.html.
    [52] M. D. Grossberg and S. K. Nayar, "What is the Space of Camera Response Functions?," IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. II-602, 2003.
    [53] HDRsoft Ltd, "Photomatix," https://www.hdrsoft.com/.
    [54] D. H. Johnson, "Signal-to-noise Ratio," Scholarpedia, vol. 1, no. 12, p. 2088, 2006.
    [55] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image Quality Assessment: from Error Visibility to Structural Similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, 2004.
    [56] A. Odena, V. Dumoulin, and C. Olah, "Deconvolution and Checkerboard Artifacts," Distill, vol. 1, no. 10, pp. e3, 2016.
    [57] C. Saharia, W. Chan, H. Chang, C. A. Lee, J. Ho, T. Salimans, D. J. Fleet, M. Norouzi, "Palette: Image-to-image Diffusion Models," arXiv:2111.05826, 2021.
    [58] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, "Masked Autoencoders are Scalable Vision Learners," IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000-16009, 2022.

    無法下載圖示 全文公開日期 2024/08/15 (校內網路)
    全文公開日期 2024/08/15 (校外網路)
    全文公開日期 2024/08/15 (國家圖書館:臺灣博碩士論文系統)
    QR CODE