基於 TraVeLGAN 與 Perceptual Loss 實現照片轉換表情符號之應用

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃銘俊 Richard Firdaus Oeyliawan
論文名稱：	基於 TraVeLGAN 與 Perceptual Loss 實現照片轉換表情符號之應用 Photo-to-Emoji Transformation with TraVeLGAN and Perceptual Loss
指導教授：	林伯慎 Bor-Shen Lin
口試委員:	羅乃維 Nai-Wei Lo 楊傳凱 Chuan-Kai Yang
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	50
中文關鍵詞：	cartoonization 、photo-to-emoji transformation 、Siamese network 、generative adversarial network 、TraVeLGAN 、perceptual loss
外文關鍵詞：	cartoonization, photo-to-emoji transformation, Siamese network, generative adversarial network, TraVeLGAN, perceptual loss
相關次數：	點閱：276 下載：9
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

Cartoon is one of the media that could both convey information and induce entertainment effects. Cartoonization of real photos is therefore of interest, and may be achieved through image transformation approaches such as generative adversarial networks. However, conventional generative adversarial network may suffer from mode collapse, i.e. generating highly similar images. This is because the only constraint imposed on the generator is to produce something similar to real images, so it leads to multiple-to-one mapping. TraVeLGAN, on the other hand, is the network that may tackle this issue by forcing the synthesized images separated and keeping their spatial relationship similar to that of the original images. In this study, image transformation for human face from photo-realistic image to emoji image based on TraVeLGAN is investigated. In the initial experiment, TraVeLGAN may generate the images with higher diversity, but they often have mismatched semantic attributes, such as hair and skin color, or shape of the head. To alleviate this problem, perceptual loss computed from VGG19 is proposed to be used with TraVeLGAN, since perceptual loss may make the output image closer to the input image on the feature map. Experimental result shows TraVeLGAN can produce the images with better quality and higher SSIM score. In addition, perceptual loss obtained from a shallower layer, such as the first or second convolutional layer, may give higher similarity and better quality. Furthermore, a generative adversarial network with perceptual loss is conducted for comparison, and it is found that TraVeLGAN is helpful for improving image quality.
Keywords: cartoonization, photo-to-emoji transformation, Siamese network, generative adversarial network, TraVeLGAN, perceptual loss.

Contents

Abstract    iv
Acknowledgement    v
Contents    vi
Content of Figure    viii
List of Formula    ix
List of Table    x
Chapter 1 Introduction    11
1.1    Background & Motivation    11
1.2    Contribution    13
1.3    Summary    13
Chapter 2 Literature Review    15
2.1    Convolution Neural Network    15
2.2    Generative Adversarial Network    17
2.3    Image Transformation    18
2.4    VGG Network    19
2.5    Siamese Network    20
2.6    U-Net Network    21
2.8    TraVeLGAN    22
2.9    Metric Evaluation    22
2.9.1    Structural Similarity    22
2.9.2    Fréchet Inception Distance Score (FID Score)    23
Chapter 3 Baseline    25
3.1     Experiment Setup    25
3.2     GAN    26
3.3    TraVeLGAN    28
Chapter 4 Improvement and Analysis    32
4.1    TraVeLGAN with Perceptual Loss    32
4.2    Implementation    34
4.3    Additional Experiment using original GAN with Perceptual Loss    37
4.5    Discussion    40
Chapter 5 Conclusion    44


                                

[1] Chen, Y., Lai, Y. K., & Liu, Y. J. (2018). Cartoongan: Generative adversarial networks for photo cartoonization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9465-9474).
[2] Li, X., Zhang, W., Shen, T., & Mei, T. (2019, July). Everyone is a Cartoonist: Selfie Cartoonization with Attentive Adversarial Networks. In 2019 IEEE International Conference on Multimedia and Expo (ICME) (pp. 652-657). IEEE.
[3] Wu, R., Gu, X., Tao, X., Shen, X., Tai, Y. W., & Jia, J. (2019). Landmark Assisted CycleGAN for Cartoon Face Generation. arXiv preprint arXiv:1907.01424.
[4] Taigman, Y., Polyak, A., & Wolf, L. (2016). Unsupervised cross-domain image generation. arXiv preprint arXiv:1611.02200.

5] Wolf, L., Taigman, Y., & Polyak, A. (2017). Unsupervised creation of parameterized avatars. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1530-1538).
[6] LeCun, Y., Haffner, P., Bottou, L., & Bengio, Y. (1999). Object recognition with gradient-based learning. In Shape, contour and grouping in computer vision (pp. 319-345). Springer, Berlin, Heidelberg.
[7] Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396.
[8] Yeh, R., Chen, C., Lim, T. Y., Hasegawa-Johnson, M., & Do, M. N. (2016). Semantic image inpainting with perceptual and contextual losses. arXiv preprint arXiv:1607.07539, 2(3).
[9] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681-4690).
[10] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
[11] Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. In Advances in neural information processing systems (pp. 700-708).
[12] Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232).
[13] Hertzmann, A., Jacobs, C. E., Oliver, N., Curless, B., & Salesin, D. H. (2001, August). Image analogies. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques (pp. 327-340).
[14] Efros, A. A., & Leung, T. K. (1999, September). Texture synthesis by non-parametric sampling. In Proceedings of the seventh IEEE international conference on computer vision (Vol. 2, pp. 1033-1038). IEEE.
[15] Ma, S., Fu, J., Wen Chen, C., & Mei, T. (2018). Da-gan: Instance-level image translation by deep attention generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5657-5666).
[16] Royer, A., Bousmalis, K., Gouws, S., Bertsch, F., Mosseri, I., Cole, F., & Murphy, K. (2020). XGAN: Unsupervised image-to-image translation for many-to-many mappings. In Domain Adaptation for Visual Understanding (pp. 33-49). Springer, Cham.
[17] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[18] Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2414-2423).
[19] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681-4690).
[20] Johnson, J., Alahi, A., & Fei-Fei, L. (2016, October). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (pp. 694-711). Springer, Cham.
[21] Gatys, L., Ecker, A. S., & Bethge, M. (2015). Texture synthesis using convolutional neural networks. In Advances in neural information processing systems (pp. 262-270).
[22] Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., & Shah, R. (1994). Signature verification using a "siamese" time delay neural network. In Advances in neural information processing systems (pp. 737-744).
[23] Neculoiu, P., Versteegh, M., & Rotaru, M. (2016, August). Learning text similarity with siamese recurrent networks. In Proceedings of the 1st Workshop on Representation Learning for NLP (pp. 148-157).
[24] Mueller, J., & Thyagarajan, A. (2016, March). Siamese recurrent architectures for learning sentence similarity. In thirtieth AAAI conference on artificial intelligence.
[25] Abeysinghe, C., Welivita, A., & Perera, I. (2019, June). Snake Image Classification using Siamese Networks. In Proceedings of the 2019 3rd International Conference on Graphics and Signal Processing (pp. 8-12).
[26] Dey, S., Dutta, A., Toledo, J. I., Ghosh, S. K., Lladós, J., & Pal, U. (2017). Signet: Convolutional siamese network for writer independent offline signature verification. arXiv preprint arXiv:1707.02131.
[27] Wiggers, K. L., Britto, A. S., Heutte, L., Koerich, A. L., & Oliveira, L. S. (2019, July). Image retrieval and pattern spotting using siamese neural network. In 2019 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
[28] Varior, R. R., Shuai, B., Lu, J., Xu, D., & Wang, G. (2016, October). A siamese long short-term memory architecture for human re-identification. In European conference on computer vision (pp. 135-153). Springer, Cham.
[29] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2).
[30] Hsu, C. C., Lin, C. W., Su, W. T., & Cheung, G. (2019). Sigan: Siamese generative adversarial network for identity-preserving face hallucination. IEEE Transactions on Image Processing, 28(12), 6225-6236.
[31] Pan, Y. L., Haung, M. J., Ding, K. T., Wu, J. L., & Jang, J. S. (2019, September). k-Same-Siamese-GAN: k-Same Algorithm with Generative Adversarial Network for Facial Image De-identification with Hyperparameter Tuning and Mixed Precision Training. In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 1-8). IEEE.
[32] Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241). Springer, Cham.
[33] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems (pp. 6626-6637).
[34] Melekhov, I., Kannala, J., & Rahtu, E. (2016, December). Siamese network features for image matching. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 378-383). IEEE.
[35] Ong, E. J., Husain, S., & Bober, M. (2017). Siamese network of deep fisher-vector descriptors for image retrieval. arXiv preprint arXiv:1702.00338.
[36] Li, C., Liu, H., Chen, C., Pu, Y., Chen, L., Henao, R., & Carin, L. (2017). Alice: Towards understanding adversarial learning for joint distribution matching. In Advances in Neural Information Processing Systems (pp. 5495-5503).
[37] Amodio, M., & Krishnaswamy, S. (2019). Travelgan: Image-to-image translation by transformation vector learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8983-8992).
[38] Goldberg, Y., & Levy, O. (2014). word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722.
[39] Melekhov, I., Kannala, J., & Rahtu, E. (2016, December). Siamese network features for image matching. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 378-383). IEEE.
[40] Ong, E. J., Husain, S., & Bober, M. (2017). Siamese network of deep fisher-vector descriptors for image retrieval. arXiv preprint arXiv:1702.00338.
[41] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[42] Taigman, Y., Yang, M., Ranzato, M. A., Wolf, L. (2014). DeepFace: Closing the Gap to Human-Level Performance in Face Verification.
[43] WANG, Zhou, et al. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 2004, 13.4: 600-612.
[44] LARSEN, Anders Boesen Lindbo, et al. Autoencoding beyond pixels using a learned similarity metric. In: International conference on machine learning. 2016. p. 1558-1566.
[45] LECUN, Yann, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86.11: 2278-2324.
[46] KIM, Taeksoo, et al. Learning to discover cross-domain relations with generative adversarial networks. arXiv preprint arXiv:1703.05192, 2017.

簡易檢索 / 詳目顯示

相關論文