Basic Search / Detailed Display

Author: 邱仁炎
Timotius Kuncoro
Thesis Title: Latent Space Explorer in StyleGAN for Attribute Editing
Latent Space Explorer in StyleGAN for Attribute Editing
Advisor: 楊傳凱
Chuan-Kai Yang
Committee: 賴源正
Yuan-Cheng Lai
林伯慎
Bor-Shen Lin
Degree: 碩士
Master
Department: 管理學院 - 資訊管理系
Department of Information Management
Thesis Publication Year: 2023
Graduation Academic Year: 111
Language: 英文
Pages: 62
Keywords (in other languages): Generative Modelling, Latent Space Exploration
Reference times: Clicks: 206Downloads: 0
Share:
School Collection Retrieve National Library Collection Retrieve Error Report

  • The program made in this thesis, has a goal that is to do car attribute editing
    via latent vector exploration. The attribute that will be explored is the ambience of
    an image. After the latent space exploration is finished, the resulting latent vectors
    that are with more than 1 ambience will be saved and then shown in 2D data
    distribution using the Principal Component Analysis (PCA) method. The final
    product of this project is Python programs. Because of the time needed to do the
    exploration and save the latent vector the program is divided into 2 parts. First is
    the latent vector explorer and the second one is the program to show the saved latent
    vectors in terms of a 2D distribution.
    To achieve the goal of this study, this program used an unsupervised
    approach to generate images using Generative Adversarial Network and one of its
    extended versions called StyleGAN. Current techniques of unsupervised
    Generative Adversarial Network require a lot of high quality images to train the
    network and do the attribute editing. Because of that, in this thesis, a customized
    stanford cars dataset is used. The customization is done by adding ambiences in
    the training images. The evaluation of the exploration is done manually. For a better
    visualization of the resulting latent vectors, it is done in a 2D distribution of points.
    Even though the training and exploration took so much time, the StyleGAN
    technique produces convincing qualitative results from the dataset. After training
    and exploration, because the generating process of 1 image only takes a few
    seconds, it is possible to make interactive user-interface applications that show the
    saved latent vectors.

    ABSTRACT ..................................................................................................... III Acknowledgment..............................................................................................IV Table of Content ............................................................................................... V List of Figures .................................................................................................VII List of Table ...................................................................................................... X Chapter 1. Introduction..................................................................................... 1 1.1 Background ........................................................................................... 1 1.2 GAN & StyleGAN................................................................................. 3 1.3 Research Objectives & Scope ................................................................ 3 1.4 Thesis Organization ............................................................................... 4 Chapter 2. Literature Review............................................................................ 6 2.1. Learning representations from unlabeled data ........................................ 6 2.2. Generative Modelling Problem .............................................................. 6 2.3. Generative Adversarial Networks .......................................................... 8 2.4. Progressive Growing GAN .................................................................... 9 2.5. StyleGan.............................................................................................. 11 2.6. Latent Space Exploration ..................................................................... 13 Chapter 3. Methodology .................................................................................. 16 3.1 System Overview................................................................................. 16 3.2 Input and Output of the System............................................................ 18 3.3 Complete System Architecture............................................................. 22 3.4 Data Flow of the System...................................................................... 24 3.5 Actor ................................................................................................... 26 3.6 Preprocessing....................................................................................... 30 Chapter 4. Experimental Results .................................................................... 36 4.1 StyleGAN2 Training Experiments Results........................................... 36 4.2 Latent Space Explorer Experiments Results......................................... 40 4.3 Latent Space Visualizer Experiments Results....................................... 42 Chapter 5. Conclusion and Discussion............................................................ 44 5.1 Conclusion........................................................................................... 44 5.2 Limitation and Future Work................................................................. 45 References ........................................................................................................ 46

    [1]Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen,
    Timo Aila. “Analyzing and Improving the Image Quality of StyleGAN”. In
    CVPR, 2019.
    [2] David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B.
    Tenenbaum, William T. Freeman, Antonio Torralba. “GAN Dissection:
    Visualizing and Understanding Generative Adversarial Networks”. In CVPR,
    2018.
    [3] Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B.
    Tenenbaum. “Learning a Probabilistic Latent Space of Object Shapes via 3D
    Generative-Adversarial Modeling”. In NIPS, 2016.
    [4] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
    Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. “Generative
    Adversarial Nets”. In NIPS, 2014.
    [5] Alec Radford, Luke Metz, Soumith Chintala. “Unsupervised Representation
    Learning with Deep Convolutional Generative Adversarial Networks”. In
    CVPR, 2015.
    [6] Xiaolong Wang, Abhinav Shrivastava, Abhinav Gupta. “A-fast-rcnn: Hard
    positive generation via adversary for object detection”. In CVPR, 2017.
    [7] Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman
    Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick
    Nguyen, Tara N. Sainath, and Brian Kingsbury. “Deep neural networks for
    acoustic modeling in speech recognition”. IEEE Signal Processing Magazine,
    29(6), 82–97.
    [8] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen
    Change Loy, Yu Qiao, Xiaoou Tang. “ESRGAN: Enhanced Super-
    Resolution Generative Adversarial Networks ”. The European Conference on
    Computer Vision Workshops (ECCVW), 2018.
    [9] Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen. “Progressive
    Growing of GANs for Improved Quality, Stability, and Variation”. In ICLR,
    2018
    [10] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros. “Image-to-
    Image Translation with Conditional Adversarial Networks”. In CVPR, 2017.
    [11] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. “Unpaired
    image-to-image translation using cycle-consistent adversarial networks”. In
    ICCV, 2017.
    [12] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. “ImageNet
    Classification with Deep Convolutional Neural Networks”. In NIPS, 2012.
    [13] Ming-Yu Liu, Oncel Tuzel. “Coupled Generative Adversarial Networks”.
    In NIPS, 2016
    [14] Antti Rasmus, Harri Valpola, Mikko Honkala, Mathias Berglund, Tapani
    Raiko. “Semi-Supervised Learning with Ladder Networks”. In NIPS, 2015.
    [15] Michael Mathieu, Camille Couprie, Yann LeCun. “Deep multi-scale video
    prediction beyond mean square error”. In ICLR, 2016.
    [16] Javier Portilla & Eero P. Simoncelli. “A Parametric Texture Model Based
    on Joint Statistics of Complex Wavelet Coefficients”. In International Journal
    of Computer Vision, 2000.
    [17] Andrew Brock, Jeff Donahue, Karen Simonyan. “Large Scale GAN
    Training for High Fidelity Natural Image Synthesis”. In ICLR, 2019.
    [18] Adam Coates and Andrew Y. Ng. “Learning Feature Representations with
    K-Means”. In Neural Networks: Tricks of the Trade, pp. 561–580. Springer,
    2012.
    [19] Donggeun Yoo, Namil Kim, Sunggyun Park, Anthony S. Paek, In So
    Kweon. “Pixel-Level Domain Transfer”. In ECCV, 2016.
    [20] Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu. “Pixel
    recurrent neural networks”. In CVPR, 2016.
    [21] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-
    Antoine Manzagol. “Stacked denoising autoencoders: Learning useful
    representations in a deep network with a local denoising criterion”. In The
    Journal of Machine Learning Research, 2010.
    [22] Carl Vondrick, Hamed Pirsiavash, Antonio Torralba. “Generating Videos
    with Scene Dynamics”. In CVPR, 2016.
    [23] Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya
    Ganguli. “Deep Unsupervised Learning using Nonequilibrium
    Thermodynamics”. In ICML, 2015.
    [24] Junbo Zhao, Michael Mathieu, Ross Goroshin, Yann LeCun. “Stacked
    what-where autoencoders”. In ICLR, 2016.
    [25] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida.
    “Spectral Normalization for Generative Adversarial Networks”. In ICLR,
    2018.
    [26] Brownlee, J. (2019) A gentle introduction to generative adversarial
    networks (Gans), MachineLearningMastery.com. Available at:
    https://machinelearningmastery.com/what-are-generative-adversarial-
    networks-gans/ (Accessed: November 20, 2022).
    [27] Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena. “Self-
    attention generative adversarial networks”. In arXiv:1805.08318, 2018.
    [28] Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt,
    Alex Graves, Koray Kavukcuoglu. “Conditional image generation with
    PixelCNN decoders”. In CVPR, 2016.
    [29] Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen,
    Timo Aila. “Training Generative Adversarial Networks with Limited Data”.
    In CVPR, 2020.
    [30] Diederik P Kingma, Max Welling. “Auto-encoding variational bayes”. In
    CoRR, 2014.
    [31] Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus. “Deep
    generative image models using a Laplacian pyramid of adversarial networks”.
    In NIPS, 2015.
    [32] Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng.
    “Convolutional Deep Belief Networks for Scalable Unsupervised Learning of
    Hierarchical Representations”. In ICML, 2009.
    [33] Susskind, J.M., Anderson, A.K. and Hinton, G.E., 2010. The toronto face
    database. Department of Computer Science, University of Toronto, Toronto,
    ON, Canada, Tech. Rep, 3, p.29.
    [34] Brownlee, J. (2019) A gentle introduction to the progressive growing gan,
    MachineLearningMastery.com. Available at:
    https://machinelearningmastery.com/introduction-to-progressive-growing-
    generative-adversarial-networks/ (Accessed: November 21, 2022).
    [35] Tero Karras, Samuli Laine, Timo Aila. “A Style-Based Generator
    Architecture for Generative Adversarial Networks”. In CVPR, 2019.
    [36] Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao,
    Jan Kautz, Bryan Catanzaro. “Video-to-video synthesis”. In NIPS, 2018.
    [37] Brandon Frey. (no date) Frey Faces Dataset. Available at:
    https://cs.nyu.edu/~roweis/data.html (Accessed: November 20, 2022).
    [38] Krizhevsky, A. “Learning multiple layers of features from tiny images”.
    2009.
    [39] Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate
    Saenko, Alexei A. Efros, Trevor Darrell. “CyCADA: Cycle-Consistent
    Adversarial Domain Adaptation”. In CVPR, 2017.
    [40] Chia-Hsing Chiu, Yuki Koyama, Yu-Chi Lai, Takeo Igarashi, Yonghao
    Yue. “Human-in-the-Loop Differential Subspace Search in High-
    Dimensional Latent Space”. In ACM Transactions on Graphics, 2020.
    [41] Brownlee, J. (2020) How to explore the gan latent space when generating
    faces, MachineLearningMastery.com. Available at:
    https://machinelearningmastery.com/how-to-interpolate-and-perform-
    vector-arithmetic-with-faces-using-a-generative-adversarial-network/
    (Accessed: December 14, 2022).
    [42] Yujun Shen, Jinjin Gu, Xiaoou Tang, Bolei Zhou. “Interpreting the Latent
    Space of GANs for Semantic Face Editing”. In CVPR, 2020.
    [43] Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, Sylvain Paris.
    “GANSpace: Discovering Interpretable GAN Controls”. In NeurIPS, 2020.
    [44] Rameen Abdal, Peihao Zhu, Niloy J. Mitra, Peter Wonka. “StyleFlow:
    Attribute-conditioned Exploration of StyleGAN-Generated Images using
    Conditional Continuous Normalizing Flows”. In ACM Transactions on
    Graphics, 2021.
    [45] Yujun Shen, Ceyuan Yang, Xiaoou Tang, Bolei Zhou. “InterFaceGAN:
    Interpreting the Disentangled Face Representation Learned by GANs”. In
    TPAMI, 2020.
    [46] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler,
    Sepp Hochreiter. “GANs Trained by a Two Time-Scale Update Rule
    Converge to a Local Nash Equilibrium”. In NIPS, 2017.
    [47] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao. “Joint Face
    Detection and Alignment using Multi-task Cascaded Convolutional
    Networks”. In IEEE Signal Processing Letters, 2016.
    [48] Ian Goodfellow. “Tutorial: Generative Adversarial Networks”. In NIPS,
    2016.
    [49] Square root of a matrix (2022) Wikipedia. Wikimedia Foundation.
    Available at: https://en.wikipedia.org/wiki/Square_root_of_a_matrix
    (Accessed: December 16, 2022).
    [50] Covariance matrix (2022) Wikipedia. Wikimedia Foundation. Available
    at: https://en.wikipedia.org/wiki/Covariance_matrix (Accessed: December
    16, 2022).
    [51] Bluewidz (no date) Fréchet Inception Distance. Available at:
    https://bluewidz.blogspot.com/2017/12/frechet-inception-distance.html
    (Accessed: December 16, 2022).
    [52] D. C. Dowson and B. V. Landau. “The Fréchet distance between
    multivariate normal distributions”. In Journal of Multivariate Analysis,
    1982.
    [53] Fréchet distance (2022) Wikipedia. Wikimedia Foundation. Available at:
    https://en.wikipedia.org/wiki/Fr%C3%A9chet_distance (Accessed:
    December 16, 2022).
    [54] Ali Borji. “Pros and Cons of GAN Evaluation Measures”. In CVPR, 2018.
    [55] Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, Olivier
    Bousquet. “Are GANs Created Equal? A Large-Scale Study”. In NIPS,
    2018.
    [56] SKLEARN.DECOMPOSITION.PCA (no date) scikit. Available at:
    https://scikit-
    learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
    (Accessed: December 18, 2022).
    [57] Sklearn.manifold.TSNE (no date) scikit. Available at: https://scikit-
    learn.org/stable/modules/generated/sklearn.manifold.TSNE.html (Accessed: December 18, 2022).

    QR CODE