簡易檢索 / 詳目顯示

研究生: 黃銘俊
Richard Firdaus Oeyliawan
論文名稱: 基於 TraVeLGAN 與 Perceptual Loss 實現照片轉換表情符號之應用
Photo-to-Emoji Transformation with TraVeLGAN and Perceptual Loss
指導教授: 林伯慎
Bor-Shen Lin
口試委員: 羅乃維
Nai-Wei Lo
楊傳凱
Chuan-Kai Yang
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 50
中文關鍵詞: cartoonizationphoto-to-emoji transformationSiamese networkgenerative adversarial networkTraVeLGANperceptual loss
外文關鍵詞: cartoonization, photo-to-emoji transformation, Siamese network, generative adversarial network, TraVeLGAN, perceptual loss
相關次數: 點閱:276下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報


Cartoon is one of the media that could both convey information and induce entertainment effects. Cartoonization of real photos is therefore of interest, and may be achieved through image transformation approaches such as generative adversarial networks. However, conventional generative adversarial network may suffer from mode collapse, i.e. generating highly similar images. This is because the only constraint imposed on the generator is to produce something similar to real images, so it leads to multiple-to-one mapping. TraVeLGAN, on the other hand, is the network that may tackle this issue by forcing the synthesized images separated and keeping their spatial relationship similar to that of the original images. In this study, image transformation for human face from photo-realistic image to emoji image based on TraVeLGAN is investigated. In the initial experiment, TraVeLGAN may generate the images with higher diversity, but they often have mismatched semantic attributes, such as hair and skin color, or shape of the head. To alleviate this problem, perceptual loss computed from VGG19 is proposed to be used with TraVeLGAN, since perceptual loss may make the output image closer to the input image on the feature map. Experimental result shows TraVeLGAN can produce the images with better quality and higher SSIM score. In addition, perceptual loss obtained from a shallower layer, such as the first or second convolutional layer, may give higher similarity and better quality. Furthermore, a generative adversarial network with perceptual loss is conducted for comparison, and it is found that TraVeLGAN is helpful for improving image quality.
Keywords: cartoonization, photo-to-emoji transformation, Siamese network, generative adversarial network, TraVeLGAN, perceptual loss. 

Contents Abstract iv Acknowledgement v Contents vi Content of Figure viii List of Formula ix List of Table x Chapter 1 Introduction 11 1.1 Background & Motivation 11 1.2 Contribution 13 1.3 Summary 13 Chapter 2 Literature Review 15 2.1 Convolution Neural Network 15 2.2 Generative Adversarial Network 17 2.3 Image Transformation 18 2.4 VGG Network 19 2.5 Siamese Network 20 2.6 U-Net Network 21 2.8 TraVeLGAN 22 2.9 Metric Evaluation 22 2.9.1 Structural Similarity 22 2.9.2 Fréchet Inception Distance Score (FID Score) 23 Chapter 3 Baseline 25 3.1 Experiment Setup 25 3.2 GAN 26 3.3 TraVeLGAN 28 Chapter 4 Improvement and Analysis 32 4.1 TraVeLGAN with Perceptual Loss 32 4.2 Implementation 34 4.3 Additional Experiment using original GAN with Perceptual Loss 37 4.5 Discussion 40 Chapter 5 Conclusion 44

[1] Chen, Y., Lai, Y. K., & Liu, Y. J. (2018). Cartoongan: Generative adversarial networks for photo cartoonization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9465-9474).
[2] Li, X., Zhang, W., Shen, T., & Mei, T. (2019, July). Everyone is a Cartoonist: Selfie Cartoonization with Attentive Adversarial Networks. In 2019 IEEE International Conference on Multimedia and Expo (ICME) (pp. 652-657). IEEE.
[3] Wu, R., Gu, X., Tao, X., Shen, X., Tai, Y. W., & Jia, J. (2019). Landmark Assisted CycleGAN for Cartoon Face Generation. arXiv preprint arXiv:1907.01424.
[4] Taigman, Y., Polyak, A., & Wolf, L. (2016). Unsupervised cross-domain image generation. arXiv preprint arXiv:1611.02200.

5] Wolf, L., Taigman, Y., & Polyak, A. (2017). Unsupervised creation of parameterized avatars. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1530-1538).
[6] LeCun, Y., Haffner, P., Bottou, L., & Bengio, Y. (1999). Object recognition with gradient-based learning. In Shape, contour and grouping in computer vision (pp. 319-345). Springer, Berlin, Heidelberg.
[7] Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396.
[8] Yeh, R., Chen, C., Lim, T. Y., Hasegawa-Johnson, M., & Do, M. N. (2016). Semantic image inpainting with perceptual and contextual losses. arXiv preprint arXiv:1607.07539, 2(3).
[9] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681-4690).
[10] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
[11] Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. In Advances in neural information processing systems (pp. 700-708).
[12] Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232).
[13] Hertzmann, A., Jacobs, C. E., Oliver, N., Curless, B., & Salesin, D. H. (2001, August). Image analogies. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques (pp. 327-340).
[14] Efros, A. A., & Leung, T. K. (1999, September). Texture synthesis by non-parametric sampling. In Proceedings of the seventh IEEE international conference on computer vision (Vol. 2, pp. 1033-1038). IEEE.
[15] Ma, S., Fu, J., Wen Chen, C., & Mei, T. (2018). Da-gan: Instance-level image translation by deep attention generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5657-5666).
[16] Royer, A., Bousmalis, K., Gouws, S., Bertsch, F., Mosseri, I., Cole, F., & Murphy, K. (2020). XGAN: Unsupervised image-to-image translation for many-to-many mappings. In Domain Adaptation for Visual Understanding (pp. 33-49). Springer, Cham.
[17] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[18] Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2414-2423).
[19] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681-4690).
[20] Johnson, J., Alahi, A., & Fei-Fei, L. (2016, October). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (pp. 694-711). Springer, Cham.
[21] Gatys, L., Ecker, A. S., & Bethge, M. (2015). Texture synthesis using convolutional neural networks. In Advances in neural information processing systems (pp. 262-270).
[22] Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., & Shah, R. (1994). Signature verification using a "siamese" time delay neural network. In Advances in neural information processing systems (pp. 737-744).
[23] Neculoiu, P., Versteegh, M., & Rotaru, M. (2016, August). Learning text similarity with siamese recurrent networks. In Proceedings of the 1st Workshop on Representation Learning for NLP (pp. 148-157).
[24] Mueller, J., & Thyagarajan, A. (2016, March). Siamese recurrent architectures for learning sentence similarity. In thirtieth AAAI conference on artificial intelligence.
[25] Abeysinghe, C., Welivita, A., & Perera, I. (2019, June). Snake Image Classification using Siamese Networks. In Proceedings of the 2019 3rd International Conference on Graphics and Signal Processing (pp. 8-12).
[26] Dey, S., Dutta, A., Toledo, J. I., Ghosh, S. K., Lladós, J., & Pal, U. (2017). Signet: Convolutional siamese network for writer independent offline signature verification. arXiv preprint arXiv:1707.02131.
[27] Wiggers, K. L., Britto, A. S., Heutte, L., Koerich, A. L., & Oliveira, L. S. (2019, July). Image retrieval and pattern spotting using siamese neural network. In 2019 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
[28] Varior, R. R., Shuai, B., Lu, J., Xu, D., & Wang, G. (2016, October). A siamese long short-term memory architecture for human re-identification. In European conference on computer vision (pp. 135-153). Springer, Cham.
[29] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2).
[30] Hsu, C. C., Lin, C. W., Su, W. T., & Cheung, G. (2019). Sigan: Siamese generative adversarial network for identity-preserving face hallucination. IEEE Transactions on Image Processing, 28(12), 6225-6236.
[31] Pan, Y. L., Haung, M. J., Ding, K. T., Wu, J. L., & Jang, J. S. (2019, September). k-Same-Siamese-GAN: k-Same Algorithm with Generative Adversarial Network for Facial Image De-identification with Hyperparameter Tuning and Mixed Precision Training. In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 1-8). IEEE.
[32] Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241). Springer, Cham.
[33] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems (pp. 6626-6637).
[34] Melekhov, I., Kannala, J., & Rahtu, E. (2016, December). Siamese network features for image matching. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 378-383). IEEE.
[35] Ong, E. J., Husain, S., & Bober, M. (2017). Siamese network of deep fisher-vector descriptors for image retrieval. arXiv preprint arXiv:1702.00338.
[36] Li, C., Liu, H., Chen, C., Pu, Y., Chen, L., Henao, R., & Carin, L. (2017). Alice: Towards understanding adversarial learning for joint distribution matching. In Advances in Neural Information Processing Systems (pp. 5495-5503).
[37] Amodio, M., & Krishnaswamy, S. (2019). Travelgan: Image-to-image translation by transformation vector learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8983-8992).
[38] Goldberg, Y., & Levy, O. (2014). word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722.
[39] Melekhov, I., Kannala, J., & Rahtu, E. (2016, December). Siamese network features for image matching. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 378-383). IEEE.
[40] Ong, E. J., Husain, S., & Bober, M. (2017). Siamese network of deep fisher-vector descriptors for image retrieval. arXiv preprint arXiv:1702.00338.
[41] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[42] Taigman, Y., Yang, M., Ranzato, M. A., Wolf, L. (2014). DeepFace: Closing the Gap to Human-Level Performance in Face Verification.
[43] WANG, Zhou, et al. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 2004, 13.4: 600-612.
[44] LARSEN, Anders Boesen Lindbo, et al. Autoencoding beyond pixels using a learned similarity metric. In: International conference on machine learning. 2016. p. 1558-1566.
[45] LECUN, Yann, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86.11: 2278-2324.
[46] KIM, Taeksoo, et al. Learning to discover cross-domain relations with generative adversarial networks. arXiv preprint arXiv:1703.05192, 2017.

QR CODE