簡易檢索 / 詳目顯示

研究生: John Jethro Corpuz Virtusio
John Jethro Corpuz Virtusio
論文名稱: Towards a General Approach to Controllable Style Transfer
Towards a General Approach to Controllable Style Transfer
指導教授: 花凱龍
Kai-Lung Hua
口試委員: 花凱龍
Kai-Lung Hua
鄭文皇
Wen-Huang Cheng
陳永耀
Yung-Yao Chen
陳宜惠
Yi-Hui Chen
孫士韋
Shih-Wei Sun
學位類別: 博士
Doctor
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 105
外文關鍵詞: Controllable Style Transfer, Anchor Styles
相關次數: 點閱:191下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  • Neural Style Transfer (NST) pioneered the idea of using CNN features to capture and migrate style. It produces high-quality stylizations showing the capacity to reproduce perceptual elements (e.g., color, strokes, textures, shapes) widely regarded as artistic style. Furthermore, it has opened new avenues for algorithmic image manipulation. Despite its impact, there is no easy way to control or influence its output. NST relies on millions of parameters, and there is no direct way of knowing which ones correspond to specific perceptual attributes. Controlling Neural Style Transfer has numerous applications and can significantly enrich creativity tools. In this dissertation, I propose methods for controllable Neural Style Transfer. I first give an overview of our method that focuses on controlling a specific style attribute (pattern density), then later extend this to Neural Style Palette (NSP). Unlike other works in controllable style transfer, NSP does not aim to control specific visual attributes. Instead, it separates a single style image into multiple sub-styles (i.e., palettes), each of which captures a unique part of the original style's perceptual attributes. This approach is more general as it can adaptively choose salient style attributes to control. Moreover, it allows users to mix and match sub-styles contained in the single-style image and dramatically increases their control over the stylization process. Overall, NSP offers a novel take on controllable style transfer. It has two main components: (1) automatic palette generation from a single style input and (2) user-controlled palette recombination.

    Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . i Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . iii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . iv Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Neural Style Transfer . . . . . . . . . . . . . . . . . . . . 1 1.2 Controllable Neural Style Transfer . . . . . . . . . . . . . 4 1.2.1 Controlling Pattern Density . . . . . . . . . . . . 5 1.3 Towards Controlling General Style Attributes . . . . . . . 14 2 Neural Style Palette . . . . . . . . . . . . . . . . . . . . . . . . 16 3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1 Style Transfer . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Multimodal Style Transfer . . . . . . . . . . . . . . . . . 19 3.3 Latent Space . . . . . . . . . . . . . . . . . . . . . . . . . 20 4 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . 22 4.2.2 Hybrid Human-Artificial Intelligence . . . . . . . 24 5 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2 Generating anchor styles . . . . . . . . . . . . . . . . . . 27 5.2.1 Style separation loss . . . . . . . . . . . . . . . . 29 5.2.2 Style unification loss . . . . . . . . . . . . . . . . 30 5.2.3 Objective Function . . . . . . . . . . . . . . . . . 30 5.3 Neural Style Palette: Interactive Blending . . . . . . . . . 31 5.4 Style Transfer and Neural Style Palette . . . . . . . . . . . 31 5.5 Fast Generation of Anchor Styles . . . . . . . . . . . . . . 33 6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 36 6.1 Implementation Details . . . . . . . . . . . . . . . . . . . 36 6.2 Baseline comparison . . . . . . . . . . . . . . . . . . . . 36 6.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . 36 6.2.2 Distribution Analysis . . . . . . . . . . . . . . . . 37 6.2.3 Loss Analysis . . . . . . . . . . . . . . . . . . . . 39 6.2.4 User Study . . . . . . . . . . . . . . . . . . . . . 41 6.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . 43 6.4 Style Representation . . . . . . . . . . . . . . . . . . . . 44 6.5 Different art techniques and movements . . . . . . . . . . 48 6.6 Number of Anchor Styles . . . . . . . . . . . . . . . . . . 50 6.6.1 Anchor Style Diversity . . . . . . . . . . . . . . . 50 6.6.2 Optimization Stability . . . . . . . . . . . . . . . 50 6.6.3 Choosing the Number of Anchor Styles . . . . . . 52 6.7 Interpolating Anchor Styles . . . . . . . . . . . . . . . . . 53 6.8 Generalization to other models . . . . . . . . . . . . . . . 55 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 8 Additional Experiments . . . . . . . . . . . . . . . . . . . . . . 60 8.1 Implementation Details . . . . . . . . . . . . . . . . . . . 60 8.2 Anchor Style Number . . . . . . . . . . . . . . . . . . . . 61 8.3 Different Margins . . . . . . . . . . . . . . . . . . . . . . 64 8.3.1 Increasing λμ . . . . . . . . . . . . . . . . . . . . 65 8.3.2 Increasing λσ . . . . . . . . . . . . . . . . . . . . 68 8.3.3 Increasing λμ and λσ . . . . . . . . . . . . . . . . 71 8.4 Qualitative Experiments . . . . . . . . . . . . . . . . . . 74 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    [1] X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in The IEEE International Conference on Computer Vision, Oct 2017.
    [2] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423, 2016.
    [3] L. Gatys, A. S. Ecker, and M. Bethge, “Texture synthesis using convolutional neural networks,” in Advances in neural information processing systems, pp. 262–270, 2015.
    [4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, May 2015.
    [5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
    [6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, IEEE, 2009.
    [7] A. A. Efros and W. T. Freeman, “Image quilting for texture synthesis and transfer,” in Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’01, (New York, NY, USA), p. 341–346, Association for Computing Machinery, 2001.
    [8] E. Reinhard, M. Adhikhmin, B. Gooch, and P. Shirley, “Color transfer between images,” IEEE Computer Graphics and Applications, vol. 21, no. 5, pp. 34–41, 2001.
    [9] L. A. Gatys, A. S. Ecker, M. Bethge, A. Hertzmann, and E. Shechtman, “Controlling perceptual factors in neural style transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3985–3993, 2017.
    [10] Y. Jing, Y. Liu, Y. Yang, Z. Feng, Y. Yu, D. Tao, and M. Song, “Stroke controllable fast style transfer with adaptive receptive fields,” in Proceedings of the European Conference on Computer Vision, pp. 238–254, 2018.
    [11] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang, “Universal style transfer via feature transforms,” in Advances in Neural Information Processing Systems, pp. 386–396, 2017.
    [12] J. J. Virtusio, D. S. Tan, W.-H. Cheng, M. Tanveer, and K.-L. Hua, “Enabling artistic control over pattern density and stroke strength,” IEEE Transactions on Multimedia, 2020.
    [13] X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in Proceedings of the European Conference on Computer Vision, pp. 172–189, 2018.
    [14] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410, 2019.
    [15] M.-Y. Liu, X. Huang, A. Mallya, T. Karras, T. Aila, J. Lehtinen, and J. Kautz, “Few-shot unsupervised image-to-image translation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10551–10560, 2019.
    [16] P. Wilmot, E. Risser, and C. Barnes, “Stable and controllable neural texture synthesis and style transfer using histogram losses,” ArXiv, vol. abs/1701.08893, 2017.
    [17] N. Kalischek, J. D. Wegner, and K. Schindler, “In the light of feature distributions: moment matching for neural style transfer,” in Conference on Computer Vision and Pattern Recognition, pp. 9382–9391, 2021.
    [18] Y. Yao, J. Ren, X. Xie, W. Liu, Y.-J. Liu, and J. Wang, “Attention-aware multi-stroke style transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1467–1475, 2019.
    [19] X. Wang, G. Oxholm, D. Zhang, and Y.-F. Wang, “Multimodal transfer: A hierarchical deep convolutional neural network for fast artistic style transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5239–5247, 2017.
    [20] R. Mechrez, I. Talmi, and L. Zelnik-Manor, “The contextual loss for image transformation with non-aligned data,” in Proceedings of the European Conference on Computer Vision, pp. 768–783, 2018.
    [21] Y. Zhang, C. Fang, Y. Wang, Z. Wang, Z. Lin, Y. Fu, and J. Yang, “Multimodal style transfer via graph cuts,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 5943–5951, 2019.
    [22] N. Kolkin, J. Salavon, and G. Shakhnarovich, “Style transfer by relaxed optimal transport and self-similarity,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10051–10060, 2019.
    [23] A. A. Efros and T. K. Leung, “Texture synthesis by non-parametric sampling,” in The IEEE International Conference on Computer Vision, (Corfu, Greece), pp. 1033–1038, September 1999.
    [24] A. A. Efros and W. T. Freeman, “Image quilting for texture synthesis and transfer,” in Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 341–346, ACM, 2001.
    [25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
    [26] L. Gatys, A. S. Ecker, and M. Bethge, “Texture synthesis using convolutional neural networks,” in Advances in Neural Information Processing Systems, pp. 262–270, 2015.
    [27] Z. Liu, X. Qi, and P. H. Torr, “Global texture enhancement for fake face detection in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8060–8069, 2020.
    [28] K. Li, M. R. Min, and Y. Fu, “Rethinking zero-shot learning: A conditional visual classification perspective,” in Proceedings of the IEEE Conference on Computer Vision, pp. 3583–3592, 2019.
    [29] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-
    resolution,” in European Conference on Computer Vision, pp. 694–711, Springer, 2016.
    [30] D. Ulyanov, A. Vedaldi, and V. S. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” CoRR, vol. abs/1607.08022, 2016.
    [31] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis,” in The IEEE Conference on Computer Vision and Pattern Recognition, July 2017.
    [32] Y. Jing, Y. Yang, Z. Feng, J. Ye, Y. Yu, and M. Song, “Neural style transfer: A review,” IEEE Transactions on Visualization and Computer Graphics, 2019.
    [33] D. Chen, L. Yuan, J. Liao, N. Yu, and G. Hua, “Stylebank: An explicit representation for neural image style transfer,” in The IEEE Conference on Computer Vision and Pattern Recognition, July 2017.
    [34] M. Lu, H. Zhao, A. Yao, Y. Chen, F. Xu, and L. Zhang, “A closed-form solution to universal style transfer,” in Proceedings of the IEEE Conference on Computer Vision, pp. 5952–5961, 2019.
    [35] L. Sheng, Z. Lin, J. Shao, and X. Wang, “Avatar-net: Multi-scale zero-shot style transfer by feature decoration,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8242–8250, 2018.
    [36] J. J. Virtusio, A. Talavera, D. S. Tan, K.-L. Hua, and A. Azcarraga, “Interactive style transfer: Towards styling user-specified object,” in 2018 IEEE Visual Communications and Image Processing, pp. 1–4, IEEE, 2018.
    [37] H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang, “Diverse image-to-image translation via disentangled representations,” in Proceedings of the European Conference on Computer Vision, pp. 35–51, 2018.
    [38] H.-Y. Lee, H.-Y. Tseng, Q. Mao, J.-B. Huang, Y.-D. Lu, M. Singh, and M.-H. Yang, “Drit++: Diverse image-to-image translation via disentangled representations,” International Journal of Computer Vision, pp. 1–16, 2020.
    [39] D. Kotovenko, A. Sanakoyeu, S. Lang, and B. Ommer, “Content and style disentanglement for artistic style transfer,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4422–4431, 2019.
    [40] T. Park, J.-Y. Zhu, O. Wang, J. Lu, E. Shechtman, A. Efros, and R. Zhang, “Swapping autoencoder for deep image manipulation,” Advances in Neural Information Processing Systems, vol. 33, pp. 7198–7211, 2020.
    [41] A. Sanakoyeu, D. Kotovenko, S. Lang, and B. Ommer, “A style-aware content loss for real-time hd style transfer,” in Proceedings of the European Conference on Computer Vision, pp. 698–714, 2018.
    [42] A. Gabbay and Y. Hoshen, “Demystifying inter-class disentanglement,” in International Conference on Learning Representations, 2019.
    [43] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
    [44] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232, 2017.
    [45] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Stargan: Unified generative adversarial networks for multi-domain image-to-image translation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797, 2018.
    [46] I. Nigam, P. Tokmakov, and D. Ramanan, “Towards latent attribute discovery from triplet similarities,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 402–410, 2019.
    [47] A. Veit, S. Belongie, and T. Karaletsos, “Conditional similarity networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 830–838, 2017.
    [48] S. C. Hidayati, T. W. Goh, J. S. G. Chan, C. C. Hsu, J. See, L. K. Wong, K. L. Hua, Y. Tsao, and
    W. H. Cheng, “Dress with style: Learning style from joint deep embedding of clothing styles and body shapes,” IEEE Transactions on Multimedia, vol. 23, pp. 365–377, 2021.
    [49] E. Kamar, “Directions in hybrid intelligence: Complementing ai systems with human intelligence.,” in International Joint Conference on Artificial Intelligence, pp. 4070–4073, 2016.
    [50] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823, 2015.
    [51] Y. Li, N. Wang, J. Liu, and X. Hou, “Demystifying neural style transfer,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17, p. 2230–2236, AAAI Press, 2017.
    [52] V. Dumoulin, J. Shlens, and M. Kudlur, “A learned representation for artistic style,” in International Conference for Learning Representations, 2017.
    [53] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference for Learning Representations, May 2015.
    [54] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang, “Diversified texture synthesis with feed-forward networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3920–3928, 2017.
    [55] A. Alexis Jacq, “Neural transfer using pytorch.” https://pytorch.org/tutorials/advanced/
    neural_style_tutorial.html. Accessed: 2018.

    QR CODE