簡易檢索 / 詳目顯示

研究生: Jonathan Hans Soeseno
Jonathan Hans Soeseno
論文名稱: Controllable and Identity-Aware Facial Attribute Transformation
Controllable and Identity-Aware Facial Attribute Transformation
指導教授: 花凱龍
Kai-Lung Hua
口試委員: Conrado D. Ruiz, Jr.
Conrado D. Ruiz, Jr.
鍾國亮
Kuo-Liang Chung
賴祐吉
Yu-Chi Lai
郭景明
Jing-Ming Guo
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 107
語文別: 英文
論文頁數: 57
中文關鍵詞: Image to Image TranslationDeep LearningGenerative Adversarial NetworkIdentity AwareControllable Transformation
外文關鍵詞: Image to Image Translation, Deep Learning, Generative Adversarial Network, Identity Aware, Controllable Transformation
相關次數: 點閱:290下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Modifying facial attributes without paired dataset proves to be a challenging task. Previous approaches either require supervision from a ground truth transformed image or require training a separate model for mapping every pair of attributes. These limit the scalability of the models to accommodate a larger set of attributes since the number of models that we need to train grows exponentially large. Another major drawback of previous approaches is unintentionally changing the identity of the person as they transform the facial attributes. We propose a method that allows for controllable and identity aware transformations across multiple facial attributes using only a single model. Our approach is to train a generative adversarial network (GAN) with a multi-task conditional discriminator that recognizes the identity of the face, distinguishes real images from fake, as well as identifies facial attributes present in an image. This guides the generator into producing an output that is realistic while preserving the person’ s identity and facial attributes. Through this framework, our model also learns meaningful image representations in a lower dimensional latent space and semantically associate separate parts of the encoded vector with both the person’ s identity and facial attributes. This opens up the possibility of generating new faces and other dataset augmentation processes.


    Modifying facial attributes without paired dataset proves to be a challenging task. Previous approaches either require supervision from a ground truth transformed image or require training a separate model for mapping every pair of attributes. These limit the scalability of the models to accommodate a larger set of attributes since the number of models that we need to train grows exponentially large. Another major drawback of previous approaches is unintentionally changing the identity of the person as they transform the facial attributes. We propose a method that allows for controllable and identity aware transformations across multiple facial attributes using only a single model. Our approach is to train a generative adversarial network (GAN) with a multi-task conditional discriminator that recognizes the identity of the face, distinguishes real images from fake, as well as identifies facial attributes present in an image. This guides the generator into producing an output that is realistic while preserving the person’ s identity and facial attributes. Through this framework, our model also learns meaningful image representations in a lower dimensional latent space and semantically associate separate parts of the encoded vector with both the person’ s identity and facial attributes. This opens up the possibility of generating new faces and other dataset augmentation processes.

    Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . i Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . iv Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.0.1 Problem Formulation . . . . . . . . . . . . . . . . 11 3.0.2 Network Architecture . . . . . . . . . . . . . . . . 12 3.0.3 Multi-task Discriminator . . . . . . . . . . . . . . 16 3.0.4 Generator . . . . . . . . . . . . . . . . . . . . . . 18 4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 23 4.0.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . 23 v4.0.2 Implementation Details . . . . . . . . . . . . . . . 24 4.0.3 Ablation Studies . . . . . . . . . . . . . . . . . . 25 4.0.4 Exploring the encoded space . . . . . . . . . . . . 28 4.0.5 Comparison to previous work . . . . . . . . . . . 32 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    [1] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycleconsistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), IEEE, October 2017.
    [2] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Stargan: Unified generative adversarial networks for multi-domain image-to-image translation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, June 2018.
    [3] B. Amos, B. Ludwiczuk, and M. Satyanarayanan, “Openface: A general-purpose face recognition library with mobile applications,” tech. rep., CMU-CS-16-118, CMU School of Computer Science, June 2016.
    [4] G. Antipov, M. Baccouche, and J.-L. Dugelay, “Face aging with conditional generative adversarial networks,” arXiv preprint arXiv:1702.01983, February 2017.
    [5] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, July 2017.
    [6] T. Kim, B. Kim, M. Cha, and J. Kim, “Unsupervised visual attribute transfer with reconfigurable generative adversarial networks,” arXiv preprint arXiv:1707.09798, July 2017.
    [7] M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” in Advances in Neural Information Processing Systems 30 (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), Curran Associates, Inc., December 2017.
    [8] G. Perarnau, J. van de Weijer, B. Raducanu, and J. M. Álvarez, “Invertible Conditional GANs for image editing,” in NIPS Workshop on Adversarial Training, Curran Associates, Inc., December 2016.
    [9] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutionalgenerative adversarial networks,” in Proceedings of the International Conference on Learning Representations (ICLR), May 2015.
    [10] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of the International Conference on Computer Vision (ICCV), IEEE, December 2015.
    [11] E. Reinhard, M. Adhikhmin, B. Gooch, and P. Shirley, “Color transfer between images,” in IEEE Computer Graphics and Applications, IEEE, October 2001.
    [12] A. Levin, D. Lischinski, and Y. Weiss, “Colorization using optimization,” in ACM Transactions on Graphics (TOG), ACM, August 2004.
    [13] D. Guo and T. Sim, “Digital face makeup by example,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, June 2009.
    [14] L. Liu, J. Xing, S. Liu, H. Xu, X. Zhou, and S. Yan, “Wow! you are so beautiful today!,” in ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), ACM, October 2014.
    [15] W.-S. Tong, C.-K. Tang, M. S. Brown, and Y.-Q. Xu, “Example-based cosmetic transfer,” in ComputerGraphics and Applications, 2007. PG’07. 15th Pacific Conference on, IEEE, December 2007.
    [16] F. Yang, J. Wang, E. Shechtman, L. Bourdev, and D. Metaxas, “Expression flow for 3d-aware face component transfer,” in ACM Transactions on Graphics (TOG), ACM, August 2011.
    [17] X. Yin and X. Liu, “Multi-task convolutional neural network for pose-invariant face recognition,” in IEEE Transactions on Image Processing, IEEE, October 2017.
    [18] Y. Zhang, W. Dong, C. Ma, X. Mei, K. Li, F. Huang, B.-G. Hu, and O. Deussen, “Data-driven synthesis of cartoon faces using different styles,” in IEEE Transactions on Image Processing, IEEE, January 2017.
    [19] D. Zhang, L. Lin, T. Chen, X. Wu, W. Tan, and E. Izquierdo, “Content-adaptive sketch portrait generation by decompositional representation learning,” in IEEE Transactions on Image Processing, IEEE, January 2017.
    [20] M. Zhang, J. Li, N. Wang, and X. Gao, “Compositional model-based sketch generator in facial entertainment,” in IEEE Transactions on Cybernetics, IEEE, March 2018.
    [21] X. Chen, X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: Interpretable representation learning by information maximizing generative adversarial nets,” in Advances in Neural Information Processing Systems 29, Curran Associates, Inc., December 2016.
    [22] K. Gregor, I. Danihelka, A. Graves, D. Rezende, and D. Wierstra, “Draw: A recurrent neural network for image generation,” in Proceedings of the 32nd International Conference on Machine Learning, PMLR, July 2015.
    [23] L. Tran, X. Yin, and X. Liu, “Disentangled representation learning gan for pose-invariant face recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, July 2017.
    [24] M. Ochs, E. Diday, and F. Afonso, “From the symbolic analysis of virtual faces to a smiles machine,” in IEEE Transactions on Cybernetics, IEEE, February 2016.
    [25] Y. Lin, J. Chen, Y. Cao, Y. Zhou, L. Zhang, Y. Y. Tang, and S. Wang, “Cross-domain recognition by identifying joint subspaces of source domain and target domain,” in IEEE Transactions on Cybernetics, IEEE, April 2017.
    [26] S. C. Hidayati, C.-W. You, W.-H. Cheng, and K.-L. Hua, “Learning and recognition of clothing genres from full-body images,” in IEEE Transactions on Cybernetics, IEEE, May 2018.
    [27] K.-H. Lo, Y.-C. F. Wang, and K.-L. Hua, “Edge-preserving depth map upsampling by joint trilateral filter,” in IEEE Transactions on Cybernetics, IEEE, January 2018.
    [28] K. Zeng, J. Yu, R. Wang, C. Li, and D. Tao, “Coupled deep autoencoder for single image superresolution,” in IEEE Transactions on Cybernetics, IEEE, January 2017.
    [29] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, Curran Associates, Inc., December 2014.
    [30] D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” in Proceedings of the 31st International Conference on Machine Learning, PMLR, June 2014.
    [31] J.-Y. Zhu, P. Krähenbühl, E. Shechtman, and A. A. Efros, “Generative visual manipulation on the natural image manifold,” in Proceedings of European Conference on Computer Vision (ECCV), Springer Science, October 2016.
    [32] M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun, “Disentangling factors of variation in deep representation using adversarial training,” in Advances in Neural Information Processing Systems 29 (D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. arnett, eds.), Curran Associates, Inc., December 2016.
    [33] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, and X. Chen, “Improved techniques for training gans,” in Advances in Neural Information Processing Systems 29, Curran Associates, Inc., December 2016.
    [34] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, November 2014.
    [35] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” arXiv preprint arXiv:1610.09585, October 2016.
    [36] X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Belongie, “Stacked generative adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, July 2017.
    [37] A. Sage, E. Agustsson, R. Timofte, and L. Van Gool, “Logo synthesis and manipulation with clustered generative adversarial networks,” arXiv preprint arXiv:1712.04407, December 2017.

    QR CODE