簡易檢索 / 詳目顯示

研究生: John Jethro Virtusio
John Jethro Virtusio
論文名稱: Enabling Control over Strokes and Pattern Density of Style Transfer using Covariance Matrix
Enabling Control over Strokes and Pattern Density of Style Transfer using Covariance Matrix
指導教授: 花凱龍
Kai-Lung Hua
口試委員: Conrado Ruiz
Conrado Ruiz
郭景明
Jing-Ming Guo
鍾國亮
Kuo-Liang Chung
賴祐吉
Yu-Chi Lai
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 47
中文關鍵詞: Style TransferNeural NetworkDeep Learning
外文關鍵詞: Style Transfer, Neural Network, Deep Learning
相關次數: 點閱:281下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Despite the remarkable results and numerous advancements on neural
    style transfer, enabling artistic freedom through the control over perceptual
    factors such as pattern density and stroke strength remains
    a challenging problem. A recent work on fast stylization networks
    is able to offer some degree of controllability on the pattern density
    by changing the resolution of the inputs. However, their solution requires
    a dedicated network architecture that can only accommodate
    a predefined set of resolutions. In this work, we propose a much simpler
    solution by addressing the fundamental limitation of neural style
    transfer models that uses the Gram matrix as its style representation.
    More specifically, we replace the Gram matrix with a covariance
    matrix in order to better capture negative spatial correlations. We
    show that this simple modification allows the model to handle a wider
    range of input resolutions. We also show that selectively manipulating
    the covariance matrix allows us to control the stroke strengths independently
    from the pattern density. Our method compares favorably
    against several state-of-the-art neural style transfer models. Moreover,
    since our approach is focused on manipulating and improving
    the Gram matrix, it is not dependent on any network architecture.
    This means that all the advancements on neural style transfer that
    use the Gram matrix as its style representation can directly benefit
    from our findings.


    Despite the remarkable results and numerous advancements on neural
    style transfer, enabling artistic freedom through the control over perceptual
    factors such as pattern density and stroke strength remains
    a challenging problem. A recent work on fast stylization networks
    is able to offer some degree of controllability on the pattern density
    by changing the resolution of the inputs. However, their solution requires
    a dedicated network architecture that can only accommodate
    a predefined set of resolutions. In this work, we propose a much simpler
    solution by addressing the fundamental limitation of neural style
    transfer models that uses the Gram matrix as its style representation.
    More specifically, we replace the Gram matrix with a covariance
    matrix in order to better capture negative spatial correlations. We
    show that this simple modification allows the model to handle a wider
    range of input resolutions. We also show that selectively manipulating
    the covariance matrix allows us to control the stroke strengths independently
    from the pattern density. Our method compares favorably
    against several state-of-the-art neural style transfer models. Moreover,
    since our approach is focused on manipulating and improving
    the Gram matrix, it is not dependent on any network architecture.
    This means that all the advancements on neural style transfer that
    use the Gram matrix as its style representation can directly benefit
    from our findings.

    Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . ii Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Review of Related Literature . . . . . . . . . . . . . . . . . . 5 2.0.1 Style Transfer. . . . . . . . . . . . . . . . . . . 5 2.0.2 Controlling Perceptual Factors. . . . . . . . . . 6 2.0.3 Failure Cases of Neural Style Transfer. . . . . . 6 2.0.4 Style Feature Scaling. . . . . . . . . . . . . . . 7 3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.0.1 Overview . . . . . . . . . . . . . . . . . . . . . 9 3.0.2 Extracting the Content of an Image . . . . . . . 9 3.0.3 Extracting the Style of an Image . . . . . . . . 10 3.0.4 Covariance Matrix vs Gram Matrix . . . . . . . 11 3.0.5 Reducing Pattern Noise . . . . . . . . . . . . . 12 iii 3.0.6 Masking the Covariance to Control Stroke Strength 13 3.0.7 Controlling Pattern Density . . . . . . . . . . . 14 4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 17 4.0.1 Implementation Details . . . . . . . . . . . . . 19 4.0.2 Ablation Study . . . . . . . . . . . . . . . . . . 19 4.0.3 Experiments with Stroke Strength . . . . . . . 20 4.0.4 Experiments with Pattern Density . . . . . . . 23 4.0.5 Comparison to other models . . . . . . . . . . . 25 5 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 7 Supplemental Results . . . . . . . . . . . . . . . . . . . . . . 31 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    References
    [1] Y. Jing, Y. Liu, Y. Yang, Z. Feng, Y. Yu, D. Tao, and M. Song, “Stroke controllable fast
    style transfer with adaptive receptive fields,” in European Conference on Computer Vision,
    pp. 244–260, Springer, 2018.
    [2] X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,”
    in The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
    [3] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang, “Universal style transfer via feature
    transforms,” in Advances in Neural Information Processing Systems, pp. 386–396, 2017.
    [4] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,”
    in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR), pp. 2414–2423, 2016.
    [5] P. Wilmot, E. Risser, and C. Barnes, “Stable and controllable neural texture synthesis and
    style transfer using histogram losses,” CoRR, vol. abs/1701.08893, 2017.
    [6] D. Ulyanov, A. Vedaldi, and V. S. Lempitsky, “Instance normalization: The missing ingredient
    for fast stylization,” CoRR, vol. abs/1607.08022, 2016.
    [7] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and superresolution,”
    in European Conference on Computer Vision, pp. 694–711, Springer, 2016.
    [8] V. Dumoulin, J. Shlens, and M. Kudlur, “A learned representation for artistic style,” Proc.
    of ICLR, 2017.
    [9] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang, “Diversified texture synthesis with
    feed-forward networks,” in The IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR), July 2017.
    [10] H. Zhang and K. Dana, “Multi-style generative network for real-time transfer,” arXiv preprint
    arXiv:1703.06953, 2017.
    [11] D. Chen, L. Yuan, J. Liao, N. Yu, and G. Hua, “Stylebank: An explicit representation
    for neural image style transfer,” in The IEEE Conference on Computer Vision and Pattern
    Recognition (CVPR), July 2017.
    [12] Y. Li, N. Wang, J. Liu, and X. Hou, “Demystifying neural style transfer,” in Proceedings of
    the 26th International Joint Conference on Artificial Intelligence, pp. 2230–2236, AAAI Press,
    2017.
    [13] T. Q. Chen and M. Schmidt, “Fast patch-based style transfer of arbitrary style,” Workshop in
    Constructive Machine Learning, Advances in Neural Information Processing Systems (NIPS),
    2016.
    35
    [14] G. Ghiasi, H. Lee, M. Kudlur, V. Dumoulin, and J. Shlens, “Exploring the structure of a realtime,
    arbitrary neural artistic stylization network,” arXiv preprint arXiv:1705.06830, 2017.
    [15] Y. Jing, Y. Yang, Z. Feng, J. Ye, Y. Yu, and M. Song, “Neural style transfer: A review,”
    arXiv preprint arXiv:1705.04058, 2017.
    [16] L. A. Gatys, A. S. Ecker, M. Bethge, A. Hertzmann, and E. Shechtman, “Controlling perceptual
    factors in neural style transfer.,” in The IEEE Conference on Computer Vision and
    Pattern Recognition (CVPR), vol. 1, p. 8, 2017.
    [17] L. Gatys, A. S. Ecker, and M. Bethge, “Texture synthesis using convolutional neural networks,”
    in Advances in Neural Information Processing Systems (NIPS), pp. 262–270, 2015.
    [18] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing
    internal covariate shift,” in Proceedings of the 32nd International Conference on International
    Conference on Machine Learning-Volume 37, pp. 448–456, JMLR. org, 2015.
    [19] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Improved texture networks: Maximizing quality
    and diversity in feed-forward stylization and texture synthesis,” in The IEEE Conference on
    Computer Vision and Pattern Recognition (CVPR), July 2017.
    [20] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness
    of deep features as a perceptual metric,” in The IEEE Conference on Computer Vision and
    Pattern Recognition (CVPR), 2018.
    [21] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in The
    European Conference on Computer Vision (ECCV), pp. 818–833, Springer, 2014.
    [22] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International
    Conference for Learning Representations (ICLR), May 2015.
    [23] C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, “Algorithm 778: L-bfgs-b: Fortran subroutines for
    large-scale bound-constrained optimization,” ACM Transactions on Mathematical Software
    (TOMS), vol. 23, no. 4, pp. 550–560, 1997.
    [24] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image
    recognition,” in International Conference on Learning Representations (ICLR), May 2015.

    QR CODE