簡易檢索 / 詳目顯示

研究生: 陳聖瑾
Sn-Jin Chen
論文名稱: 神經風格轉換之影像品質與色彩優化
Optimization on Image and Color Quality for Neural Style Transfer
指導教授: 孫沛立
Pei-Li Sun
口試委員: 林宗翰
Tzung-Han Lin
羅梅君
Mei-Chun Lo
孫沛立
Pei-Li Sun
學位類別: 碩士
Master
系所名稱: 應用科技學院 - 色彩與照明科技研究所
Graduate Institute of Color and Illumination Technology
論文出版年: 2023
畢業學年度: 112
語文別: 中文
論文頁數: 108
中文關鍵詞: 風格轉換色彩轉換多層級特徵圖影像超解析處理影像品質評估
外文關鍵詞: style transfer, color transfer, multi-layer feature maps, image-super-resolution, image quality assessment
相關次數: 點閱:61下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

神經風格轉換是利用人工神經網路提取影像特徵,將目標風格影像轉移至目標內容影像的一種影像生成技術。在新生成的影像中,物件的輪廓位置必須近似於內容影像,紋理與色彩特徵的空間位置雖然與風格影像截然不同,但應呈現相似的視覺特徵。然而生成的影像品質未必總是令人滿意,經常有風格特徵模糊、色彩轉移不完全、內容輪廓受到風格特徵的干擾,導致變形、虛化等問題,並且在於生成高解析影像時運算成本較高、時間較久。
本研究的目的在於改善上述神經風格轉換的影像瑕疵,並進一步將風格影像分成紋理與色彩兩幅目標影像。使風格轉換模型的影像生成結果,能夠分別在紋理、色彩與內容輪廓三方面特徵,近似於不同的三個目標影像。
為了達成研究目標,本研究進行了四項實驗,個別說明如下:
實驗一「風格轉換模型比較」:比較不同風格轉換模型的表現,探討影響生成效果的重點參數。同時將風格轉換與影像超解析與銳化處理結合,探討這些處理對影像品質與運算成本的影響。得出風格轉換低運算成本演算法與重點參數,以利後續實驗。
實驗二「基於影像分析的參數優化」:針對實驗一獲得的重點參數,對輸入的內容與風格影像進行特徵提取,得到量化指標。再透過線性迴歸推導預測模型,以預測總損失函式的最佳內容權重。使得風格轉換前能夠使用最佳內容權重參數,減少錯誤嘗試法的時間。
實驗三「VGG的多層特徵圖優化」:延續實驗二的優化模型,加強內容輪廓區域的權重。將輪廓偵測與VGG多層特徵圖資訊結合,在保持內容清晰條件下,顯現更豐富的風格紋理。
實驗四 「紋理與色彩轉換」:將實驗二與三的優化風格轉換模型進行延伸,將風格影像拆分為紋理影像與色彩目標影像,透過兩階段風格轉換,使得生成影像的特徵能夠在紋理、色彩與內容三方面,分別與紋理影像、色彩影像、內容影像近似。


Neural Style Transfer (NST) is an image generation technique that transfers a target style image to a target content image by artificial neural network. In the newly generated image, the outlines must be similar to the content image, and the texture and color features should be similar to the style image although they are very different in spatial position. However, the quality of the generated images may not be satisfactory, and there are often problems such as blurring of style features, incomplete color transfer, and distorted and unclear content outlines due to the interference of style features, as well as high computational cost and time in generating high-resolution images.
The purpose of this study is to improve the above-mentioned image defects of NST and to further divide the style image into two target images of texture and color. This will enable the image generation results of the NST model to approximate the three target images in terms of texture, color, and content, respectively.
In order to achieve the research objectives, four experiments were conducted in this study, which are described as follows:
Experiment 1 – NST model comparison: Compare the performance of different NST models to investigate the key parameter that affect the generation effects. We also combined the NST with image super-resolution and sharpening to investigate the effects of these processes on image quality and computational cost. The low computational cost algorithm and key parameters of NST are obtained to facilitate subsequent experiments.
Experiment 2 – Parameter optimization based on image analysis: Focus on the key parameter obtained in Experiment 1, we extract the features of the input content and style images to obtain quantitative indexes. A prediction model is then derived through linear regression to predict the optimal content weights for the total loss function. This allows the best parameter values to be used before NST, reducing the time of error attempts.
Experiment 3 – Multi-layer feature map optimization based on VGG model: Extend the optimization model of Experiment 2, the weight of the content contour area is enhanced. Combining the contour detection with VGG multi-layer feature maps results in the generated image with a richer style texture while maintaining content clarity.
Experiment 4 – Texture and color transfer: Extend the optimized model of Experiments 2 and 3 by splitting the style image into texture image and color target image. Through a two-stage NST, the features of the generated image can be similar to the texture image, color image and content image in terms of texture, color and content, respectively.

摘要 I ABSTRACT II 誌謝 IV 目錄 V 圖目錄 VIII 表目錄 XII 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 2 1.3 論文大綱 3 第二章 文獻探討 5 2.1 神經網路架構 5 2.1.1 人工神經網路(ANN) 5 2.1.2 卷積神經網路(CNN) 6 2.1.3 對抗生成網路(GAN) 8 2.2 影像風格轉換 9 2.2.1 神經風格轉移(NST) 9 2.2.2 風格轉換的色彩保留 13 2.2.3 即時風格轉換(FNST) 14 2.2.4 即時風格轉換缺失部分:Instance Normalization 16 2.2.5 即時多風格轉換 17 2.2.6 即時任意風格轉換 19 2.2.7 AdaIN 20 2.2.8 CycleGan 21 2.2.9 Dual Gan 23 2.2.10 風格轉換的穩定性 24 2.3 影像紋理生成 26 2.3.1 基於卷積神經網路的紋理生成 26 2.3.2 前饋神經網路的紋理生成與風格轉換 28 2.4 影像超解析處理(SR) 30 2.4.1 SRCNN 30 2.4.2 SRGAN 32 2.4.3 VDSR 33 第三章 研究方法 36 3.1 硬體與環境 36 3.2 研究流程與架構 37 3.3 實驗與測試影像 38 3.3.1 實驗影像 38 3.3.2 測試影像 39 第四章 實驗一:風格轉換模型比較 41 4.1 NST與NTP模型比較 41 4.2 NST、NTP、CW-NTP生成影像成果 42 4.3 定量分析 46 4.3.1 輪廓對比度分析 46 4.3.2 傅立葉頻譜分析 48 4.3.3 色差評估 49 4.3.4 定量分析結果 50 4.3.5 定量分析小結 51 4.4 定性分析 51 4.4.1 人因指標 51 4.4.2 實驗配置與流程 52 4.4.3 定性分析結果 52 4.4.4 定性分析小結 54 4.5 人臉區域優化 54 4.6 影像生成之時間成本 56 4.7 實驗一小結 57 第五章 實驗二:基於影像分析的參數優化 58 5.1 生成影像變數分析 58 5.2 影像分析與量化 62 5.2.1 細節特徵分析 63 5.2.2 結構特徵分析 64 5.3 視覺感知分析 65 5.4 內容權重預測模型 70 5.5 內容權重預測模型分析 71 5.6 實驗二影像測試 73 5.7 實驗二小結 75 第六章 實驗三:VGG多層特徵圖優化 76 6.1 VGG淺層特徵圖優化 76 6.2 VGG中層特徵圖優化 80 6.3 VGG多層特徵圖優化 81 6.4 實驗三成果 84 6.5 實驗三影像測試 86 6.6 實驗二與實驗三測試影像比較 89 6.7 實驗三小結 90 第七章 實驗四:紋理與色彩轉換 91 7.1 第一階段風格轉換 92 7.2 通道影像匹配與替換 93 7.3 實驗四成果 94 7.4 實驗四影像測試 96 7.5 其餘測試影像 100 7.6 實驗四小結 102 第八章 結論與建議 103 8.1 結論 103 8.2 建議 104 參考文獻 105

[1] A. Kalhor, “An introduction to artificial neural networks,” Hardw. Archit. Deep Learn., no. September 2016, pp. 3–26, 2020,doi: 10.1049/PBCS055E_ch1.
[2] A. Saxena, “An Introduction to Convolutional Neural Networks,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 10, no. 12, pp. 943–947, 2022,doi: 10.22214/ijraset.2022.47789.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–14, 2015.
[4] I. Goodfellow et al., “Generative adversarial networks,” Commun. ACM, vol. 63, no. 11, pp. 139–144, 2020, doi: 10.1145/3422622.
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 770–778, 2016,doi: 10.1109/CVPR.2016.90.
[6] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image Style Transfer Using Convolutional Neural Networks,” Proc. IEEE Comput. Soc. Conf. Comput. Vis.Pattern Recognit., vol. 2016-Decem, pp. 2414–2423,2016, doi: 10.1109/CVPR.2016.265.
[7] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-timestyle transfer and super-resolution,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9906 LNCS, pp.694–711, 2016, doi: 10.1007/978-3-319-46475-6_43.
[8] D. Ulyanov, V. Lebedev, A. Vedaldi, and V. Lempitsky, “Texture networks: Feed-forward synthesis of textures and stylized images,” 33rd Int. Conf. Mach. Learn. ICML 2016, vol. 3, pp. 2027–2041, 2016.
[9] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance Normalization: The Missing Ingredient for Fast Stylization,” no. 2016, 2016, [Online]. Available: http://arxiv.org/abs/1607.08022
[10] V. Dumoulin, J. Shlens, and M. Kudlur, “A learned representation for artistic style,” 5th Int. Conf. Learn. Represent. ICLR 2017 - Conf. Track Proc., no. 2016, 2017.
[11] G. Ghiasi, H. Lee, M. Kudlur, V. Dumoulin, and J. Shlens, “Exploring the structure of a real-time, arbitrary neural artistic stylization network,” Br. Mach. Vis. Conf. 2017, BMVC 2017, pp. 1–27, 2017, doi: 10.5244/c.31.114.
[12] X. Huang and S. Belongie, “Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-Octob, pp. 1510–1519, 2017, doi: 10.1109/ICCV.2017.167.
[13] T. Nayak and H. T. Ng, “Effective modeling of encoder-decoder architecture for joint entity and relation extraction,” AAAI 2020 - 34th AAAI Conf. Artif. Intell., pp. 8528–8535, 2020, doi: 10.1609/aaai.v34i05.6374.
[14] J. Y. Zhu, T. Park, P. Isola, and A. A.Efros, “Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-Octob, pp. 2242–2251, 2017, doi:10.1109/ICCV.2017.244.
[15] Z. Yi, H. Zhang, P. Tan, and M. Gong, “DualGAN: Unsupervised Dual Learning for Image-to-Image Translation,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-Octob, pp. 2868–2876, 2017,
doi: 10.1109/ICCV.2017.310.
[16] D. He et al., “Dual learning for machine translation,” Adv. Neural Inf. Process. Syst., pp. 820–828, 2016.
[17] C. Szegedy, S. Ioffe, V. Vanhoucke, andA. A. Alemi, “Inception-v4, inception-ResNet and the impact of residual connections on learning,” 31st AAAI Conf. Artif. Intell. AAAI 2017, pp. 4278–4284, 2017, doi: 10.1609/aaai.v31i1.11231.
[18] G. Huang, Z. Liu, L. Van DerMaaten, and K. Q. Weinberger, “Densely connected convolutional networks,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 2261–2269, 2017, doi: 10.1109/CVPR.2017.243.
[19] P. Wang, Y. Li, and N. Vasconcelos, “Rethinking and Improving the Robustness of Image Style Transfer,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. PatternRecognit., pp. 124–133, 2021, doi: 10.1109/CVPR46437.2021.00019.
[20] L. A. Gatys, A. S. Ecker, and M. Bethge, “Texture synthesis using convolutional neural networks,” Adv. Neural Inf. Process. Syst., vol. 2015-Janua, pp. 262–270, 2015.
[21] C. Dong, C. C. Loy, K. He, and X. Tang, “Image Super-Resolution Using Deep Convolutional Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, 2016, doi: 10.1109/TPAMI.2015.2439281.
[22] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2861–2873, 2010, doi: 10.1109/TIP.2010.2050625.
[23] C. Ledig et al., “Photo-realistic single image super-resolution using a generative adversarial network,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 105–114, 2017, doi:10.1109/CVPR.2017.19.
[24] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 1646–1654, 2016, doi:10.1109/CVPR.2016.182.
[25] J. N. Archana and P. Aishwarya, “A Review on the Image Sharpening Algorithms Using Unsharp Masking,” Int. J. Eng. Sci. Comput., vol. 6, no. August, pp. 8729–8733, 2016, [Online]. Available: http://ijesc.org/
[26] M. I. Rajab, “Performance Evaluation of Image Edge Detection Techniques,” Rajab Int. J. Comput. Sci. Secur., no. 10, p. 170, 2016, [Online].Available:https://www.cscjournals.org/manuscript/Journals/IJCSS/Volume10/Issue5/IJCSS-1287.pdf
[27] J. Gall, P. Gehler, and B. Leibe, “Preface,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9358, pp. v–vi, 2015, doi: 10.1007/978-3-319-24947-6.
[28] L. A. Gatys, M. Bethge, A. Hertzmann, and E. Shechtman, “Preserving Color in Neural Artistic Style Transfer,” pp. 1–8, 2016, [Online]. Available: http://arxiv.org/abs/1606.05897

QR CODE