Author: |
邱仁炎 Timotius Kuncoro |
---|---|
Thesis Title: |
Latent Space Explorer in StyleGAN for Attribute Editing Latent Space Explorer in StyleGAN for Attribute Editing |
Advisor: |
楊傳凱
Chuan-Kai Yang |
Committee: |
賴源正
Yuan-Cheng Lai 林伯慎 Bor-Shen Lin |
Degree: |
碩士 Master |
Department: |
管理學院 - 資訊管理系 Department of Information Management |
Thesis Publication Year: | 2023 |
Graduation Academic Year: | 111 |
Language: | 英文 |
Pages: | 62 |
Keywords (in other languages): | Generative Modelling, Latent Space Exploration |
Reference times: | Clicks: 402 Downloads: 5 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
The program made in this thesis, has a goal that is to do car attribute editing
via latent vector exploration. The attribute that will be explored is the ambience of
an image. After the latent space exploration is finished, the resulting latent vectors
that are with more than 1 ambience will be saved and then shown in 2D data
distribution using the Principal Component Analysis (PCA) method. The final
product of this project is Python programs. Because of the time needed to do the
exploration and save the latent vector the program is divided into 2 parts. First is
the latent vector explorer and the second one is the program to show the saved latent
vectors in terms of a 2D distribution.
To achieve the goal of this study, this program used an unsupervised
approach to generate images using Generative Adversarial Network and one of its
extended versions called StyleGAN. Current techniques of unsupervised
Generative Adversarial Network require a lot of high quality images to train the
network and do the attribute editing. Because of that, in this thesis, a customized
stanford cars dataset is used. The customization is done by adding ambiences in
the training images. The evaluation of the exploration is done manually. For a better
visualization of the resulting latent vectors, it is done in a 2D distribution of points.
Even though the training and exploration took so much time, the StyleGAN
technique produces convincing qualitative results from the dataset. After training
and exploration, because the generating process of 1 image only takes a few
seconds, it is possible to make interactive user-interface applications that show the
saved latent vectors.
[1]Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen,
Timo Aila. “Analyzing and Improving the Image Quality of StyleGAN”. In
CVPR, 2019.
[2] David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B.
Tenenbaum, William T. Freeman, Antonio Torralba. “GAN Dissection:
Visualizing and Understanding Generative Adversarial Networks”. In CVPR,
2018.
[3] Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B.
Tenenbaum. “Learning a Probabilistic Latent Space of Object Shapes via 3D
Generative-Adversarial Modeling”. In NIPS, 2016.
[4] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. “Generative
Adversarial Nets”. In NIPS, 2014.
[5] Alec Radford, Luke Metz, Soumith Chintala. “Unsupervised Representation
Learning with Deep Convolutional Generative Adversarial Networks”. In
CVPR, 2015.
[6] Xiaolong Wang, Abhinav Shrivastava, Abhinav Gupta. “A-fast-rcnn: Hard
positive generation via adversary for object detection”. In CVPR, 2017.
[7] Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman
Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick
Nguyen, Tara N. Sainath, and Brian Kingsbury. “Deep neural networks for
acoustic modeling in speech recognition”. IEEE Signal Processing Magazine,
29(6), 82–97.
[8] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen
Change Loy, Yu Qiao, Xiaoou Tang. “ESRGAN: Enhanced Super-
Resolution Generative Adversarial Networks ”. The European Conference on
Computer Vision Workshops (ECCVW), 2018.
[9] Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen. “Progressive
Growing of GANs for Improved Quality, Stability, and Variation”. In ICLR,
2018
[10] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros. “Image-to-
Image Translation with Conditional Adversarial Networks”. In CVPR, 2017.
[11] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. “Unpaired
image-to-image translation using cycle-consistent adversarial networks”. In
ICCV, 2017.
[12] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. “ImageNet
Classification with Deep Convolutional Neural Networks”. In NIPS, 2012.
[13] Ming-Yu Liu, Oncel Tuzel. “Coupled Generative Adversarial Networks”.
In NIPS, 2016
[14] Antti Rasmus, Harri Valpola, Mikko Honkala, Mathias Berglund, Tapani
Raiko. “Semi-Supervised Learning with Ladder Networks”. In NIPS, 2015.
[15] Michael Mathieu, Camille Couprie, Yann LeCun. “Deep multi-scale video
prediction beyond mean square error”. In ICLR, 2016.
[16] Javier Portilla & Eero P. Simoncelli. “A Parametric Texture Model Based
on Joint Statistics of Complex Wavelet Coefficients”. In International Journal
of Computer Vision, 2000.
[17] Andrew Brock, Jeff Donahue, Karen Simonyan. “Large Scale GAN
Training for High Fidelity Natural Image Synthesis”. In ICLR, 2019.
[18] Adam Coates and Andrew Y. Ng. “Learning Feature Representations with
K-Means”. In Neural Networks: Tricks of the Trade, pp. 561–580. Springer,
2012.
[19] Donggeun Yoo, Namil Kim, Sunggyun Park, Anthony S. Paek, In So
Kweon. “Pixel-Level Domain Transfer”. In ECCV, 2016.
[20] Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu. “Pixel
recurrent neural networks”. In CVPR, 2016.
[21] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-
Antoine Manzagol. “Stacked denoising autoencoders: Learning useful
representations in a deep network with a local denoising criterion”. In The
Journal of Machine Learning Research, 2010.
[22] Carl Vondrick, Hamed Pirsiavash, Antonio Torralba. “Generating Videos
with Scene Dynamics”. In CVPR, 2016.
[23] Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya
Ganguli. “Deep Unsupervised Learning using Nonequilibrium
Thermodynamics”. In ICML, 2015.
[24] Junbo Zhao, Michael Mathieu, Ross Goroshin, Yann LeCun. “Stacked
what-where autoencoders”. In ICLR, 2016.
[25] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida.
“Spectral Normalization for Generative Adversarial Networks”. In ICLR,
2018.
[26] Brownlee, J. (2019) A gentle introduction to generative adversarial
networks (Gans), MachineLearningMastery.com. Available at:
https://machinelearningmastery.com/what-are-generative-adversarial-
networks-gans/ (Accessed: November 20, 2022).
[27] Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena. “Self-
attention generative adversarial networks”. In arXiv:1805.08318, 2018.
[28] Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt,
Alex Graves, Koray Kavukcuoglu. “Conditional image generation with
PixelCNN decoders”. In CVPR, 2016.
[29] Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen,
Timo Aila. “Training Generative Adversarial Networks with Limited Data”.
In CVPR, 2020.
[30] Diederik P Kingma, Max Welling. “Auto-encoding variational bayes”. In
CoRR, 2014.
[31] Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus. “Deep
generative image models using a Laplacian pyramid of adversarial networks”.
In NIPS, 2015.
[32] Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng.
“Convolutional Deep Belief Networks for Scalable Unsupervised Learning of
Hierarchical Representations”. In ICML, 2009.
[33] Susskind, J.M., Anderson, A.K. and Hinton, G.E., 2010. The toronto face
database. Department of Computer Science, University of Toronto, Toronto,
ON, Canada, Tech. Rep, 3, p.29.
[34] Brownlee, J. (2019) A gentle introduction to the progressive growing gan,
MachineLearningMastery.com. Available at:
https://machinelearningmastery.com/introduction-to-progressive-growing-
generative-adversarial-networks/ (Accessed: November 21, 2022).
[35] Tero Karras, Samuli Laine, Timo Aila. “A Style-Based Generator
Architecture for Generative Adversarial Networks”. In CVPR, 2019.
[36] Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao,
Jan Kautz, Bryan Catanzaro. “Video-to-video synthesis”. In NIPS, 2018.
[37] Brandon Frey. (no date) Frey Faces Dataset. Available at:
https://cs.nyu.edu/~roweis/data.html (Accessed: November 20, 2022).
[38] Krizhevsky, A. “Learning multiple layers of features from tiny images”.
2009.
[39] Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate
Saenko, Alexei A. Efros, Trevor Darrell. “CyCADA: Cycle-Consistent
Adversarial Domain Adaptation”. In CVPR, 2017.
[40] Chia-Hsing Chiu, Yuki Koyama, Yu-Chi Lai, Takeo Igarashi, Yonghao
Yue. “Human-in-the-Loop Differential Subspace Search in High-
Dimensional Latent Space”. In ACM Transactions on Graphics, 2020.
[41] Brownlee, J. (2020) How to explore the gan latent space when generating
faces, MachineLearningMastery.com. Available at:
https://machinelearningmastery.com/how-to-interpolate-and-perform-
vector-arithmetic-with-faces-using-a-generative-adversarial-network/
(Accessed: December 14, 2022).
[42] Yujun Shen, Jinjin Gu, Xiaoou Tang, Bolei Zhou. “Interpreting the Latent
Space of GANs for Semantic Face Editing”. In CVPR, 2020.
[43] Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, Sylvain Paris.
“GANSpace: Discovering Interpretable GAN Controls”. In NeurIPS, 2020.
[44] Rameen Abdal, Peihao Zhu, Niloy J. Mitra, Peter Wonka. “StyleFlow:
Attribute-conditioned Exploration of StyleGAN-Generated Images using
Conditional Continuous Normalizing Flows”. In ACM Transactions on
Graphics, 2021.
[45] Yujun Shen, Ceyuan Yang, Xiaoou Tang, Bolei Zhou. “InterFaceGAN:
Interpreting the Disentangled Face Representation Learned by GANs”. In
TPAMI, 2020.
[46] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler,
Sepp Hochreiter. “GANs Trained by a Two Time-Scale Update Rule
Converge to a Local Nash Equilibrium”. In NIPS, 2017.
[47] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao. “Joint Face
Detection and Alignment using Multi-task Cascaded Convolutional
Networks”. In IEEE Signal Processing Letters, 2016.
[48] Ian Goodfellow. “Tutorial: Generative Adversarial Networks”. In NIPS,
2016.
[49] Square root of a matrix (2022) Wikipedia. Wikimedia Foundation.
Available at: https://en.wikipedia.org/wiki/Square_root_of_a_matrix
(Accessed: December 16, 2022).
[50] Covariance matrix (2022) Wikipedia. Wikimedia Foundation. Available
at: https://en.wikipedia.org/wiki/Covariance_matrix (Accessed: December
16, 2022).
[51] Bluewidz (no date) Fréchet Inception Distance. Available at:
https://bluewidz.blogspot.com/2017/12/frechet-inception-distance.html
(Accessed: December 16, 2022).
[52] D. C. Dowson and B. V. Landau. “The Fréchet distance between
multivariate normal distributions”. In Journal of Multivariate Analysis,
1982.
[53] Fréchet distance (2022) Wikipedia. Wikimedia Foundation. Available at:
https://en.wikipedia.org/wiki/Fr%C3%A9chet_distance (Accessed:
December 16, 2022).
[54] Ali Borji. “Pros and Cons of GAN Evaluation Measures”. In CVPR, 2018.
[55] Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, Olivier
Bousquet. “Are GANs Created Equal? A Large-Scale Study”. In NIPS,
2018.
[56] SKLEARN.DECOMPOSITION.PCA (no date) scikit. Available at:
https://scikit-
learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
(Accessed: December 18, 2022).
[57] Sklearn.manifold.TSNE (no date) scikit. Available at: https://scikit-
learn.org/stable/modules/generated/sklearn.manifold.TSNE.html (Accessed: December 18, 2022).