簡易檢索 / 詳目顯示

研究生: 張家菁
Chia-Ching Chang
論文名稱: 基於風格生成對抗網路在漫畫網點生成
StyleGAN-based Manga Screentone Generation
指導教授: 賴祐吉
Yu-Chi Lai
口試委員: 陳怡伶
Yi-Ling Chen
劉一宇
Yi-Yu Liu
林士勛
Shih-Syun Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 40
中文關鍵詞: 漫畫網點風格生成對抗網路
外文關鍵詞: Manga, Screentone, StyleGAN
相關次數: 點閱:217下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

線上漫畫閱讀已成為全球趨勢,然而,傳統手繪漫畫經掃描成點陣圖後,由於,掃描設備的解析度有其固有的限制,可能無法完全捕捉到微小的網點(Screentone) 紋理,因此,畫質和網點(Screentone) 信息可能失真,網點(Screentone) 的紋理(Texture) 邊緣因為雜訊而導致形狀變形,甚至因為低畫質使網點(Screentone) 模糊不清,影響整體視覺效果,使漫畫的畫面變得失去原有的風格,或是與原先表達內容有所不同。對於讀者來說,可能就會感受到閱讀體驗的下降。因此,生成高品質的網點(Screentone),對於提升線上漫畫閱讀體驗是至關重要的。過去的研究雖然嘗試對漫畫網點進行重建或低畫質網點還原,但是,主要關注的是整體漫畫相似度而非網點(Screentone) 生成結果,導致網路在判別生成品質以做修正時,會被額外的資訊誤導,無法專注於網點(Screentone) 的分布與形狀上,因此,網點(Screentone) 學習存在困難。尤其,傳統所使用卷積神經網路(Convolutional neural network,CNN) 的核(Kernel) 無法達到旋轉一致,引起位置資訊傳遞的誤差,因此,容易造成生成網點(Screentone) 的形狀扭曲。本論文提出一種以風格生成對抗網路(Style-based generative adversarial network,StyleGAN) 為基礎的網點(Screentone) 生成方法,風格生成對抗網路(StyleGAN) 中提出了旋轉一致性,使位置資訊能確實傳遞,因此,能產生間距一致的網點(Screentone)。但是,風格生成對抗網路(StyleGAN) 生成圖片的大小是固定的,而我們需要足夠大的網點(Screentone),才能覆蓋掉原始網點(Screentone) 的區域,而大圖片需要更大的網路來生成與辨識圖片內容,因此,訓練成本會大幅提高。因此,我們修改風格生成對抗網路(StyleGAN) 中的位置資訊與訓練方式,使其能得知更大視野的為資訊,並且,指定位移以拼貼,來產生不限大小的網點,以覆蓋原始網點區域。透過生成圓形、重複圖樣網點,作為訓練資料,訓練我們的網路,能隨意生成高品質的網點(Screentone)。消融測試(Ablation Study) 證明了本研究改進有效的達成大張網點(Screentone) 的生成,而與其他的方法比較,在視覺上證明出,本研究的方法生成網點(Screentone) 能保持間距,有更高的生成品質,顯示本研究成果優於先前方法,另外,也將各方法的結果與真實網點(Screentone) 計算特徵的分布距離,以證明本論文的生成的結果與真實網點(Screentone) 在視覺上更接近。除此之外,取樣策略和訓練大小是深度學習模型訓練過程中兩個關鍵因素,本研究分別做一系列實驗,發現隨機與分層取樣在模型的收斂速度和效能上均表現相當。此外,選擇小圖片比起大圖片作為訓練大小,在多次實驗中展現了其相對容易收斂的特性。
此外,本研究的網點生成器(Screentone generator) 不僅適用於線上漫畫閱讀,還可以作為漫畫家繪圖時的網點(Screentone) 生成工具,有助於創作過程中的組合、微調網點(Screentone) 風格或靈感獲取。


Online manga reading has emerged as a global trend. However, when scanned into bitmap images, traditional hand-drawn manga may only partially capture the subtle textures of screentone due to the inherent resolution limitations of scanning equipment. Consequently, image quality and screentone information could become distorted. The edges of the screentone texture may be deformed due to noise, and the screentone could even become unclear due to low image quality, affecting the overall visual effect. This could cause the manga to lose its original style or diverge from the conveyed initially content. From the reader’s perspective, this might result in a decline in reading experience. Therefore, generating high-quality screentone is essential for enhancing the online manga reading experience.
Past research has tried to reconstruct manga screentone or restore low-quality screentone, focusing primarily on overall manga similarity rather than the result of screentone generation. This led to the network being misguided by extraneous information when discerning the quality of the generation for adjustments, making it challenging to focus on the distribution and shape of the screentone, and thus, making screentone learning difficult. Remarkably, the kernels used in traditional convolution neural networks could not achieve rotation consistency, causing errors in position information transmission and potentially distorting the shape of the generated screentone.
This paper proposes a screentone generation method based on StyleGAN. In Style-GAN, rotation consistency was introduced, allowing for precise transmission of position information and, thus, enabling the production of evenly spaced screentone. However, the size of the image generated by StyleGAN is fixed, and we need a sufficiently large screentone to cover the area of the original screentone. Generating larger images requires larger networks to create and identify image content, significantly increasing training costs. Therefore, we have modified the position information and training methods in StyleGAN so that it can obtain information from a larger field of view. Additionally, we designated offsets for patching to generate a screentone of unrestricted size to cover the area of the original screentone.
By generating circular and repeating screentone patterns as training data, we trained
our network to create high-quality screentone at will. Our ablation study validated the effectiveness of our improvements in generating large screentone. Compared to other methods, our method demonstrated visually that the generated screentone can maintain its spacing and has higher generation quality, indicating that our research outcomes outperform previous methods. Furthermore, we compared the results of each process with the real screentone, calculating the distribution distance of the features to verify that the results generated in this paper are visually closer to the real screentone. Furthermore, sampling strategy and training size are two crucial factors in the training process of deep learning models. This study conducted a series of experiments on each, finding that both random and stratified sampling showed comparable performance in terms of model convergence speed and efficiency. Additionally, choosing smaller images over larger ones as the training size demonstrated a relatively easier convergence characteristic in multiple experiments.”
Moreover, our screentone generator is suitable for online manga reading and can serve as a screentone generation tool for manga artists during drawing. It aids in combining, fine-tuning screentone styles, or acquiring inspiration during creation.

中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . I Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . .III 目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . V 圖目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . VII 表目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII 1 緒論. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1   1.1 問題定義. . . . . . . . . . . . . . . . . . . . . . . 3   1.2 主要貢獻. . . . . . . . . . . . . . . . . . . . . . . 4   1.3 論文架構. . . . . . . . . . . . . . . . . . . . . . . 4 2 相關研究. . . . . . . . . . . . . . . . . . . . . . . . . . 6    2.1 基於參數化(Parameterize) 生成. . . . . . . . . . . . . 6   2.2 基於紋理合成(Texture synthesis) 方法. . . . . . . . . 7   2.3 基於深度學習(Deep learning) . . . . . . . . . . . . . 9   2.4 生成對抗網路(Generative adversarial network,GAN) . . 10 3 網點生成器(Screentone generator) . . . . . . . . . . . . . .12   3.1 網點生成器(Screentone generator) 網路. . . . . . . . .12     3.1.1 網路架構. . . . . . . . . . . . . . . . . . . . 12     3.1.2 傅立葉特徵(Fourier feature) . . . . . . . . . . 14     3.2 更大視野的學習. . . . . . . . . . . . . . . . . . 15     3.3 生成網點(Screentone) . . . . . . . . . . . . . . .15     3.4 損失函數(Loss function) . . . . . . . . . . . . . 16 4 實驗設計和結果與討論. . . . . . . . . . . . . . . . . . . . 18   4.1 網點(Screentone) 訓練資料生成. . . . . . . . . . . . .18     4.1.1 生成基礎形狀組成的規律網點(Regular screentone) . 19     4.1.2 生成非基礎組成的規律網點(Regular screentone) . . 20   4.2 消融測試(Ablation Study) . . . . . . . . . . . . . . .21   4.3 結果比較. . . . . . . . . . . . . . . . . . . . . . . 22   4.4 弗雷歇特初始距離(FID) 指標驗證. . . . . . . . . . . . 24   4.5 分層與隨機取樣實驗. . . . . . . . . . . . . . . . . . 25   4.6 訓練大小實驗. . . . . . . . . . . . . . . . . . . . . 26 5 結語與未來工作. . . . . . . . . . . . . . . . . . . . . . . 27 參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . 28

[1] A. Lagae and P. Dutré, “A comparison of methods for generating poisson disk distributions,”Comput. Graph. Forum, vol. 27, pp. 114–129, 03 2008.
[2] A. Lagae, S. Lefebvre, G. Drettakis, and P. Dutré, “Procedural noise using sparsegabor convolution,” in ACM SIGGRAPH 2009 Papers, SIGGRAPH ’09, (New York,NY, USA), Association for Computing Machinery, 2009.
[3] D. Gabor, “Theory of communication,” Journal of the Institution of Electrical Engineers,vol. 93, pp. 429–457, 1946.
[4] C.-Y. Yao, S.-H. Hung, G.-W. Li, I.-Y. Chen, R. Adhitya, and Y.-C. Lai, “Manga vectorizationand manipulation with procedural simple screentone,” IEEE Transactionson Visualization and Computer Graphics, vol. 23, no. 2, pp. 1070–1084, 2017.
[5] D.-J. Heeger and J.-R. Bergen, “Pyramid-based texture analysis/synthesis,” in Proceedings.,International Conference on Image Processing, vol. 3, pp. 648–651 vol.3,1995.
[6] J. Portilla and E. Simoncelli, “A parametric texture model based on joint statistics ofcomplex wavelet coefficients,” International Journal of Computer Vision, vol. 40,10 2000.
[7] J. S. De Bonet, “Multiresolution sampling procedure for analysis and synthesis oftexture images,” in Proceedings of the 24th Annual Conference on Computer Graphicsand Interactive Techniques, SIGGRAPH ’97, (USA), p. 361–368, ACM Press/Addison-Wesley Publishing Co., 1997.
[8] B. Galerne, Y. Gousseau, and J.-M. Morel, “Random phase textures: Theory andsynthesis,” IEEE Transactions on Image Processing, vol. 20, no. 1, pp. 257–267,2011.
[9] A.-A. Efros and T.-K. Leung, “Texture synthesis by non-parametric sampling,” inProceedings of the Seventh IEEE International Conference on Computer Vision,vol. 2, pp. 1033–1038 vol.2, 1999.28
[10] L.-Y. Wei and M. Levoy, “Fast texture synthesis using tree-structured vector quantization,”in Proceedings of the 27th Annual Conference on Computer Graphics andInteractive Techniques, SIGGRAPH ’00, (USA), p. 479–488, ACM Press/Addison-Wesley Publishing Co., 2000.
[11] A.-A. Efros and W.-T. Freeman, “Image quilting for texture synthesis and transfer,”in Proceedings of the 28th Annual Conference on Computer Graphics and InteractiveTechniques, SIGGRAPH ’01, (New York, NY, USA), p. 341–346, Association forComputing Machinery, 2001.
[12] V. Kwatra, A. Schödl, I. Essa, G. Turk, and A. Bobick, “Graphcut textures: Imageand video synthesis using graph cuts,” ACM Trans. Graph., vol. 22, p. 277–286, jul2003.
[13] V. Kwatra, I. Essa, A. Bobick, and N. Kwatra, “Texture optimization for examplebasedsynthesis,” ACM Trans. Graph., vol. 24, p. 795–802, jul 2005.
[14] L.-A. Gatys, A.-S. Ecker, and M. Bethge, “Texture synthesis using convolutionalneural networks,” 2015.
[15] O. Sendik and D. Cohen-Or, “Deep correlations for texture synthesis,” ACM Trans.Graph., vol. 36, jul 2017.
[16] U. Bergmann, N. Jetchev, and R. Vollgraf, “Learning texture manifolds with theperiodic spatial gan,” in International Conference on Machine Learning, vol. 70 ofICML’17, pp. 469 – 477, JMLR.org, 2017.
[17] Y. Zhou, Z. Zhu, X. Bai, D. Lischinski, D. Cohen-Or, and H. Huang, “Non-stationarytexture synthesis by adversarial expansion,” ACM Trans. Graph., vol. 37, jul 2018.
[18] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generativeadversarial networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, p. 4217–4228, dec 2021.
[19] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,A. Courville, and Y. Bengio, “Generative adversarial networks,” Commun. ACM,vol. 63, p. 139–144, oct 2020.29
[20] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzingand improving the image quality of stylegan,” in 2020 IEEE/CVF Conference onComputer Vision and Pattern Recognition (CVPR), pp. 8107–8116, 2020.
[21] T. Karras, M. Aittala, S. Laine, E. Härkönen, J. Hellsten, J. Lehtinen, and T. Aila,“Alias-free generative adversarial networks,” in Advances in Neural InformationProcessing Systems (M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W.Vaughan, eds.), vol. 34, pp. 852–863, Curran Associates, Inc., 2021.
[22] M. Xie, C. Li, X. Liu, and T.-T. Wong, “Manga filling style conversion with screentonevariational autoencoder,” ACM Trans. Graph., vol. 39, nov 2020.
[23] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Ganstrained by a two time-scale update rule converge to a local nash equilibrium,” in Advancesin Neural Information Processing Systems (I. Guyon, U. V. Luxburg, S. Bengio,H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), vol. 30, CurranAssociates, Inc., 2017.
[24] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inceptionarchitecture for computer vision,” in 2016 IEEE Conference on ComputerVision and Pattern Recognition (CVPR), pp. 2818–2826, 2016.

QR CODE