簡易檢索 / 詳目顯示

研究生: 胡氏妝
Trang-Thi Ho
論文名稱: 基於深度神經網路之影像分類與轉換
Image Classification and Transformation via Deep Neural Network
指導教授: 花凱龍
Kai-Lung Hua
口試委員: 陳永耀
Yung-Yao Chen
陸敬互
Ching-Hu Lu
林鼎然
Ting-Lan Lin
郭彥甫
Yan-Fu Kuo
花凱龍
Kai-Lung Hua
學位類別: 博士
Doctor
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 101
中文關鍵詞: Image classificationmulti-scale pyramidMarkov random fieldsconvolutional neural networkGenerative adversarial networksImage synthesisSemantic keypointsConvolutional autoencoder
外文關鍵詞: Image classification, multi-scale pyramid, Markov random fields, convolutional neural network, Generative adversarial networks, Image synthesis, Semantic keypoints, Convolutional autoencoder
相關次數: 點閱:306下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • As the main media of human communication and understanding of the world, image is one of the important information sources of human intelligence activities. In recent years, deep learning techniques have been implemented in various multimedia applications. Many deep learning based systems have been built to enhance the image classification performance and transformation. However, specific image classification and transformation tasks still remain a grand challenge in real world settings. This study proposes novel approaches that focus on artist-based painting classification and sketch-guided deep portrait generation using deep neural networks.
    In terms of artist-based painting classification, we propose multi-scale networks which process one image both globally and locally and create more samples from one image to achieve accurate prediction. Using a multi-scale pyramid representation improves the understanding of characteristics and styles of painting images of different artists. We further demonstrate how Markov random fields, an unsupervised method that does not require additional labeled data, can be employed to build the label relationship between local image patches. In addition, we introduce a new entropy-based exponential fusion scheme to aggregate models outputs and demonstrate the superior performance of the proposed method using two challenging painting datasets.
    On the topic of sketch-guided deep portrait generation, we aims to present a novel framework that generates realistic human body images from sketches and takes into account the structure that is unique to the human class. Our framework takes in a sketch of a human body and outputs a realistic human image having a similar pose with the corresponding sketch. Naturally, the problem that our image synthesis network is trying to solve can also be phrased as an image-to-image translation problem where we translate sketches to photo-realistic images. The main idea is to use semantic keypoints corresponding to the coordinates of 18 essential human body parts (e.g., noise, chest, shoulder, elbow, wrist, knee, ankle, eyes,ears, etc.) as a prior for human sketch-image synthesis. Our input sketches are very rough and often lack fine details and colorings. Therefore it is difficult to extract keypoints from sketches. To obtain semantic keypoints from these rough sketches, we first generate initial images by adopting a fully residual convolutional neural network. From these initial images, the pose estimator can easily extract the semantic keypoints. In addition, we formulate a GAN based model with several loss terms that adopts the nearest neighbor-resize convolution layers for upsampling which help network avoid the unwanted chekerboard patterns. Moreover, we further manipulate the color in different parts of human body images by estimating the human segmentation through Resnet-101 and atrous spatial pyramid pooling
    (ASPP). We then transfer the output images into the HSL colorspace and further control the colors of the selected part of the human body in our generated images. Through various evaluation methods using 6300 sketch-image pairs, we verify that our proposed method compares favorably against state-of-the-art approaches.


    As the main media of human communication and understanding of the world, image is one of the important information sources of human intelligence activities. In recent years, deep learning techniques have been implemented in various multimedia applications. Many deep learning based systems have been built to enhance the image classification performance and transformation. However, specific image classification and transformation tasks still remain a grand challenge in real world settings. This study proposes novel approaches that focus on artist-based painting classification and sketch-guided deep portrait generation using deep neural networks.
    In terms of artist-based painting classification, we propose multi-scale networks which process one image both globally and locally and create more samples from one image to achieve accurate prediction. Using a multi-scale pyramid representation improves the understanding of characteristics and styles of painting images of different artists. We further demonstrate how Markov random fields, an unsupervised method that does not require additional labeled data, can be employed to build the label relationship between local image patches. In addition, we introduce a new entropy-based exponential fusion scheme to aggregate models outputs and demonstrate the superior performance of the proposed method using two challenging painting datasets.
    On the topic of sketch-guided deep portrait generation, we aims to present a novel framework that generates realistic human body images from sketches and takes into account the structure that is unique to the human class. Our framework takes in a sketch of a human body and outputs a realistic human image having a similar pose with the corresponding sketch. Naturally, the problem that our image synthesis network is trying to solve can also be phrased as an image-to-image translation problem where we translate sketches to photo-realistic images. The main idea is to use semantic keypoints corresponding to the coordinates of 18 essential human body parts (e.g., noise, chest, shoulder, elbow, wrist, knee, ankle, eyes,ears, etc.) as a prior for human sketch-image synthesis. Our input sketches are very rough and often lack fine details and colorings. Therefore it is difficult to extract keypoints from sketches. To obtain semantic keypoints from these rough sketches, we first generate initial images by adopting a fully residual convolutional neural network. From these initial images, the pose estimator can easily extract the semantic keypoints. In addition, we formulate a GAN based model with several loss terms that adopts the nearest neighbor-resize convolution layers for upsampling which help network avoid the unwanted chekerboard patterns. Moreover, we further manipulate the color in different parts of human body images by estimating the human segmentation through Resnet-101 and atrous spatial pyramid pooling
    (ASPP). We then transfer the output images into the HSL colorspace and further control the colors of the selected part of the human body in our generated images. Through various evaluation methods using 6300 sketch-image pairs, we verify that our proposed method compares favorably against state-of-the-art approaches.

    Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background on image processing . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions of the dissertation . . . . . . . . . . . . . . . . . . . . . 2 1.3 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Artist-based painting classification using Markov Random Fields with Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Multi-scale Networks . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.2 Decision Refinement using Markov Random Fields . . . . . . . 11 2.3.3 Entropy-based exponential fusion scheme . . . . . . . . . . . . 12 2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.2 Ablation analysis . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.3 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 Sketch-Guided Deep Portrait Generation . . . . . . . . . . . . . . . . . . . 36 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3.1 Semantic keypoint extraction . . . . . . . . . . . . . . . . . . 41 3.3.2 Coarse image generation . . . . . . . . . . . . . . . . . . . . . 42 3.3.3 Image refinement . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.4.1 Dataset Generation . . . . . . . . . . . . . . . . . . . . . . . . 47 3.4.2 Implementation details . . . . . . . . . . . . . . . . . . . . . . 48 3.4.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.4 Comparison with the state-of-the-art . . . . . . . . . . . . . . 55 3.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4 Conclusion and Future Directions . . . . . . . . . . . . . . . . . . . . . . . 60 4.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . 62 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 A Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

    [1] R. Zhang and B. Xin, \A review of woven fabric pattern recognition based on
    image processing technology," Research Journal of Textile and Apparel, 2016.
    [2] G. Zhang and B. Xin, \An overview of the application of image processing
    technology for yarn hairiness evaluation," Research Journal of Textile and
    Apparel, 2016.
    [3] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, \Image-to-image translation
    with conditional adversarial networks," in Proceedings of the IEEE conference
    on computer vision and pattern recognition, pp. 1125{1134, 2017.
    [4] G. Wang, D. Hoiem, and D. Forsyth, \Building text features for object image
    classification," in 2009 IEEE conference on computer vision and pattern
    recognition, pp. 1367{1374, IEEE, 2009.
    [5] A. Krizhevsky, I. Sutskever, and G. Hinton, \Imagenet classification with deep
    convolutional neural networks," Advances in Neural Information Processing
    Systems 25, pp. 1097{1105, 2012.
    [6] S.-h. Zhong, Y. Liu, and Y. Liu, \Bilinear deep learning for image classification,"
    in Proceedings of the 19th ACM international conference on Multimedia,
    pp. 343{352, ACM, 2011.
    [7] S.-H. Zhong, Y. Liu, and K. A. Hua, \Field effect deep networks for image
    recognition with incomplete data," ACM Transactions on Multimedia Com-
    puting, Communications, and Applications (TOMM), vol. 12, no. 4, p. 52,
    2016.
    [8] J. Ke, Y. Peng, S. Liu, Z. Sun, and X. Wang, \A novel grouped sparse representation
    for face recognition," Multimedia Tools and Applications, vol. 78,
    no. 6, pp. 7667{7689, 2019.
    [9] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, \Learning transferable architectures
    for scalable image recognition," in Proceedings of the IEEE conference
    on computer vision and pattern recognition, pp. 8697{8710, 2018.
    [10] M. L. Cordero-Maldonado, S. Perathoner, K.-J. Van Der Kolk, R. Boland,
    U. Heins-Marroquin, H. P. Spaink, A. H. Meijer, A. D. Crawford, and
    J. De Sonneville, \Deep learning image recognition enables efficient genome
    editing in zebrafish by automated injections," PloS one, vol. 14, no. 1,
    p. e0202377, 2019.
    [11] F. Feng, X. Wang, and R. Li, \Cross-modal retrieval with correspondence
    autoencoder," in Proceedings of the 22nd ACM international conference on
    Multimedia, pp. 7{16, ACM, 2014.
    [12] W. Wang, B. C. Ooi, X. Yang, D. Zhang, and Y. Zhuang, \Effective multimodal
    retrieval based on stacked auto-encoders," Proceedings of the VLDB
    Endowment, vol. 7, no. 8, pp. 649{660, 2014.
    [13] W. Wang, G. Chen, H. Chen, T. T. A. Dinh, J. Gao, B. C. Ooi, K.-L.
    Tan, S. Wang, and M. Zhang, \Deep learning at scale and at ease," ACM
    Transactions on Multimedia Computing, Communications, and Applications
    (TOMM), vol. 12, no. 4s, p. 69, 2016.
    [14] L. Zhang, S. Wang, and B. Liu, \Deep learning for sentiment analysis: A survey,"
    Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery,
    vol. 8, no. 4, p. e1253, 2018.
    [15] J. Guo, B. Song, P. Zhang, M. Ma, W. Luo, et al., \Affective video content
    analysis based on multimodal data fusion in heterogeneous networks," Infor-
    mation Fusion, vol. 51, pp. 224{232, 2019.
    [16] I. Katib and D. Medhi, \A study on layer correlation effects through a multilayer
    network optimization problem," in Proceedings of the 23rd International
    Teletraffic Congress, pp. 31{38, International Teletraffic Congress, 2011.
    [17] Y. Guo, Y. Liu, E. M. Bakker, Y. Guo, and M. S. Lew, \Cnn-rnn: a large-scale
    hierarchical image classi cation framework," Multimedia Tools and Applica-
    tions, vol. 77, no. 8, pp. 10251{10271, 2018.
    [18] X. Yang, Y. Ye, X. Li, R. Y. Lau, X. Zhang, and X. Huang, \Hyperspectral
    image classification with deep learning models," IEEE Transactions on
    Geoscience and Remote Sensing, vol. 56, no. 9, pp. 5408{5423, 2018.
    [19] S. Khan, N. Islam, Z. Jan, I. U. Din, and J. J. C. Rodrigues, \A novel deep
    learning based framework for the detection and classification of breast cancer
    using transfer learning," Pattern Recognition Letters, vol. 125, pp. 1{6, 2019.
    [20] S.-h. Zhong, X. Huang, and Z. Xiao, \Fine-art painting classification via twochannel
    dual path networks," International Journal of Machine Learning and
    Cybernetics, vol. 11, no. 1, pp. 137{152, 2020.
    [21] M. O. Kelek, N. Calik, and T. Yildirim, \Painter classification over the novel
    art painting data set via the latest deep neural networks," Procedia Comput
    Sci, vol. 154, pp. 369{376, 2019.
    [22] C. Sandoval, E. Pirogova, and M. Lech, \Two-stage deep learning approach to
    the classification of fine-art paintings," IEEE Access, vol. 7, pp. 41770{41781,
    2019.
    [23] WikiArt, \WikiArt the online home for visual arts from all around the world.,"
    2016.
    [24] Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and
    L. Jackel, \Backpropagation applied to hand-written zip code recognition,"
    Neural Computing, vol. 1, no. 4, pp. 541{551, 1989.
    [25] G. Kalliatakis, S. Ehsan, A. Leonardis, M. Fasli, and K. D. McDonald-Maier,
    \Exploring object-centric and scene-centric cnn features and their complementarity
    for human rights violations recognition in images," IEEE Access, vol. 7,
    pp. 10045{10056, 2019.
    [26] P. Li, L. Zhao, D. Xu, and D. Lu, \Optimal transport of deep feature for
    image style transfer," in Proceedings of the 2019 4th International Conference
    on Multimedia Systems and Signal Processing, pp. 167{171, 2019.
    [27] Y.-T. Chang, W.-H. Cheng, B. Wu, and K.-L. Hua, \Fashion world map:
    Understanding cities through streetwear fashion," pp. 91{99, ACM, 2017.
    [28] Z. Qiu, F. Yan, Y. Zhuang, and H. Leung, \Outdoor semantic segmentation for
    ugvs based on cnn and fully connected crfs," IEEE Sensors Journal, vol. 19,
    no. 11, pp. 4290{4298, 2019.
    [29] J. Sanchez-Riera, K. Srinivasan, K.-L. Hua, W.-H. Cheng, M. A. Hossain,
    and M. F. Alhamid, \Robust rgb-d hand tracking using deep learning priors,"
    IEEE Transactions on Circuits and Systems for Video Technology, vol. 28,
    no. 9, pp. 2289{2301, 2017.
    [30] Y. Che, Y. Song, and Y. Qi, \A novel framework of hand localization and
    hand pose estimation," in ICASSP 2019-2019 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP), pp. 2222{2226, IEEE,
    2019.
    [31] K.-L. Hua, C.-H. Hsu, S. C. Hidayati, W.-H. Cheng, and Y.-J. Chen,
    \Computer-aided classification of lung nodules on computed tomography images
    via deep learning technique," OncoTargets and therapy, vol. 8, 2015.
    [32] P. Sudharshan, C. Petitjean, F. Spanhol, L. E. Oliveira, L. Heutte, and
    P. Honeine, \Multiple instance learning for histopathological breast cancer
    image classification," Expert Systems with Applications, vol. 117, pp. 103{111,
    2019.
    [33] J. Y. Lee, \Deep learning ensemble with data augmentation using a transcoder
    in visual description," Multimedia Tools and Applications, vol. 78, no. 22,
    pp. 31231{31243, 2019.
    [34] S. Kumar, A. Tyagi, T. Sahu, P. Shukla, and A. Mittal, \Indian art form
    recognition using convolutional neural networks," in 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 800{
    804, IEEE, 2018.
    [35] K. A. Jangtjik, M.-C. Yeh, and K.-L. Hua, \Artist-based classification via
    deep learning with multi-scale weighted pooling," in Proceedings of the 2016
    ACM on Multimedia Conference, MM '16, (New York, NY, USA), pp. 635{
    639, ACM, 2016.
    [36] Q. Hanchao and H. Shannon, \A new method for visual stylometry on impresionist
    paintings," IEEE International Conference on Acoustics, Speech and
    Signal Processing, pp. 2036{2039, 2011.
    [37] M. Sun, D. Zhang, J. Ren, Z.Wang, and J. S.Ji, \Brushstroke based sparse hybrid
    convolutional neural networks for author classification of chinese ink-wash
    paintings," IEEE International Conference on Image Processing, pp. 626{630,
    2015.
    [38] S. Zhao, H. Yao, X. Jiang, and X. Sun, \Predicting discrete probability distribution
    of image emotion," IEEE International Conference on Image Process-
    ing, pp. 2459{2463, 2015.
    [39] K. Peng and T. Chen, \Cross-layer features in convolutional neural networks
    for generic classification tasks," International Conference in Image Processing,
    pp. 3057{3061, 2015.
    [40] W. R. Tan, C. S. Chan, H. E. Aguirre, and K. Tanaka, \Ceci n'est pas une
    pipe: A deep convolutional network for fine-art paintings classification," in
    Image Processing (ICIP), 2016 IEEE International Conference on, pp. 3703{
    3707, IEEE, 2016.
    [41] S. J. Pan and Q. Yang, \A survey on transfer learning," IEEE Transactions
    on knowledge and data engineering, vol. 22, no. 10, pp. 1345{1359, 2010.
    [42] K. A. Jangtjik, T.-H. Trang, M.-C. Yeh, and K.-L. Hua, \A cnn-lstm framework
    for authorship classification of paintings," IEEE International Conference on Image Processing, pp. 2866{2870, 2017.
    [43] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael, \Learning low-level
    vision," International journal of computer vision, vol. 40, no. 1, pp. 25{47,
    2000.
    [44] Z. Liu, X. Li, P. Luo, C. C. Loy, and X. Tang, \Deep learning markov random
    field for semantic segmentation," IEEE transactions on pattern analysis and
    machine intelligence, vol. 40, no. 8, pp. 1814{1828, 2017.
    [45] J. M. Hammersley and P. Clifford, \Markov fields on finite graphs and lattices,"
    Unpublished manuscript, vol. 46, 1971.
    [46] P. Perez et al., Markov random fields and images. IRISA, 1998.
    [47] J. Lu, D. Min, R. S. Pahwa, and M. N. Do, \A revisit to mrf-based depth map
    super-resolution and enhancement," in Acoustics, Speech and Signal Process-
    ing (ICASSP), 2011 IEEE International Conference on, pp. 985{988, IEEE,
    2011.
    [48] D. Kim and K.-j. Yoon, \High-quality depth map up-sampling robust to edge
    noise of range sensors," in Image Processing (ICIP), 2012 19th IEEE Inter-
    national Conference on, pp. 553{556, IEEE, 2012.
    [49] K.-H. Lo, K.-L. Hua, and Y.-C. F. Wang, \Depth map super-resolution via
    markov random fields without texture-copying artifacts," International Con-
    ference on Acoustics, Speech and Signal Processing, pp. 1414 { 1418, 2013.
    [50] PaintingDb, \PaintingDb fastest growing art gallery in the web," 2015.
    [51] T. Chen, M.-M. Cheng, P. Tan, A. Shamir, and S.-M. Hu, \Sketch2photo:
    Internet image montage," in ACM Transactions on Graphics (TOG), vol. 28,
    p. 124, ACM, 2009.
    [52] M. Eitz, R. Richter, K. Hildebrand, T. Boubekeur, and M. Alexa, \Photosketcher:
    interactive sketch-based image synthesis," IEEE Computer Graphics
    and Applications, vol. 31, no. 6, pp. 56{66, 2011.
    [53] K. He, X. Zhang, S. Ren, and J. Sun, \Deep residual learning for image
    recognition," in Proceedings of the IEEE conference on computer vision and
    pattern recognition, pp. 770{778, 2016.
    [54] A. Krizhevsky, I. Sutskever, and G. E. Hinton, \Imagenet classification with
    deep convolutional neural networks," in Advances in neural information pro-
    cessing systems, pp. 1097{1105, 2012.
    [55] Y. LeCun, Y. Bengio, and G. Hinton, \Deep learning," nature, vol. 521,
    no. 7553, p. 436, 2015.
    [56] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
    A. Courville, and Y. Bengio, \Generative adversarial nets," in Advances in
    neural information processing systems, pp. 2672{2680, 2014.
    [57] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, \Unpaired image-to-image translation
    using cycle-consistent adversarial networks," in Proceedings of the IEEE
    international conference on computer vision, pp. 2223{2232, 2017.
    [58] W. Chen and J. Hays, \Sketchygan: Towards diverse and realistic sketch to
    image synthesis," in Proceedings of the IEEE Conference on Computer Vision
    and Pattern Recognition, pp. 9416{9425, 2018.
    [59] Y. Peng and J. Qi, \Cm-gans: cross-modal generative adversarial networks for
    common representation learning," ACM Transactions on Multimedia Comput-
    ing, Communications, and Applications (TOMM), vol. 15, no. 1, p. 22, 2019.
    [60] Y. Lu, S. Wu, Y.-W. Tai, and C.-K. Tang, \Sketch-to-image generation using
    deep contextual completion," ArXiv, vol. abs/1711.08972, 2017.
    [61] Y. Jo and J. Park, \Sc-fegan: Face editing generative adversarial network with
    user's sketch and color," in Proceedings of the IEEE International Conference
    on Computer Vision, pp. 1745{1753, 2019.
    [62] B. Yang, X. Chen, R. Hong, Z. Chen, Y. Li, and Z.-J. Zha, \Joint sketchattribute
    learning for fine-grained face synthesis," in International Conference
    on Multimedia Modeling, pp. 790{801, Springer, 2020.
    [63] S. Xie and Z. Tu, \Holistically-nested edge detection," in Proceedings of the
    IEEE international conference on computer vision, pp. 1395{1403, 2015.
    [64] E. Simo-Serra, S. Iizuka, K. Sasaki, and H. Ishikawa, \Learning to simplify:
    fully convolutional networks for rough sketch cleanup," ACM Transactions on
    Graphics (TOG), vol. 35, no. 4, p. 121, 2016.
    [65] Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang, \Deepfashion: Powering robust
    clothes recognition and retrieval with rich annotations," in Proceedings of the
    IEEE conference on computer vision and pattern recognition, pp. 1096{1104,
    2016.
    [66] Y. Cao, C. Wang, L. Zhang, and L. Zhang, \Edgel index for large-scale sketchbased
    image search," 2011.
    [67] Y. Cao, H. Wang, C. Wang, Z. Li, L. Zhang, and L. Zhang, \Mindfinder:
    interactive sketch-based image search on millions of images," in Proceedings of
    the 18th ACM international conference on Multimedia, pp. 1605{1608, ACM,
    2010.
    [68] M. Eitz, K. Hildebrand, T. Boubekeur, and M. Alexa, \An evaluation of descriptors
    for large-scale image retrieval from sketched feature lines," Comput-
    ers & Graphics, vol. 34, no. 5, pp. 482{498, 2010.
    [69] M. Eitz, K. Hildebrand, T. Boubekeur, and M. Alexa, \Sketch-based image
    retrieval: Benchmark and bag-of-features descriptors," IEEE transactions on
    visualization and computer graphics, vol. 17, no. 11, pp. 1624{1636, 2011.
    [70] R. Hu, M. Barnard, and J. Collomosse, \Gradient field descriptor for sketch
    based retrieval and localization," in Image Processing (ICIP), 2010 17th IEEE
    International Conference on, pp. 1025{1028, IEEE, 2010.
    [71] R. Hu and J. Collomosse, \A performance evaluation of gradient field hog
    descriptor for sketch based image retrieval," Computer Vision and Image Un-
    derstanding, vol. 117, no. 7, pp. 790{806, 2013.
    [72] R. Hu, T. Wang, and J. Collomosse, \A bag-of-regions approach to sketchbased
    image retrieval," in Image Processing (ICIP), 2011 18th IEEE Interna-
    tional Conference on, pp. 3661{3664, IEEE, 2011.
    [73] S. James, M. J. Fonseca, and J. Collomosse, \Reenact: Sketch based choreographic
    design from archival dance footage," in Proceedings of International
    Conference on Multimedia Retrieval, p. 313, ACM, 2014.
    [74] K. Li, K. Pang, Y.-Z. Song, T. Hospedales, H. Zhang, and Y. Hu, \Finegrained
    sketch-based image retrieval: The role of part-aware attributes," in
    Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on,
    pp. 1{9, IEEE, 2016.
    [75] Y.-L. Lin, C.-Y. Huang, H.-J. Wang, and W. Hsu, \3d sub-query expansion
    for improving sketch-based multi-view image retrieval," in Proceedings of the
    IEEE International Conference on Computer Vision, pp. 3495{3502, 2013.
    [76] C. Wang, Z. Li, and L. Zhang, \Mindfinder: image search by interactive
    sketching and tagging," in Proceedings of the 19th international conference
    on World wide web, pp. 1309{1312, ACM, 2010.
    [77] F. Wang, L. Kang, and Y. Li, \Sketch-based 3d shape retrieval using convolutional
    neural networks," in Proceedings of the IEEE Conference on Computer
    Vision and Pattern Recognition, pp. 1875{1883, 2015.
    [78] J.-Y. Zhu, Y. J. Lee, and A. A. Efros, \Averageexplorer: Interactive exploration
    and alignment of visual data collections," ACM Transactions on Graph-
    ics (TOG), vol. 33, no. 4, p. 160, 2014.
    [79] Q. Yu, F. Liu, Y.-Z. Song, T. Xiang, T. M. Hospedales, and C.-C. Loy, \Sketch
    me that shoe," in Proceedings of the IEEE Conference on Computer Vision
    and Pattern Recognition, pp. 799{807, 2016.
    [80] D. Berthelot, T. Schumm, and L. Metz, \Began: boundary equilibrium generative
    adversarial networks," arXiv preprint arXiv:1703.10717, 2017.
    [81] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville,
    \Improved training of wasserstein gans," in Advances in Neural Information
    Processing Systems, pp. 5767{5777, 2017.
    [82] W. Sun, F. Liu, and W. Xu, \Unlabeled samples generated by gan improve the
    person re-identification baseline," in Proceedings of the 2019 5th International
    Conference on Computer and Technology Applications, pp. 117{123, ACM,
    2019.
    [83] C. Si, W. Wang, L. Wang, and T. Tan, \Multistage adversarial losses for posebased
    human image synthesis," in 2018 IEEE/CVF Conference on Computer
    Vision and Pattern Recognition, pp. 118{126, 2018.
    [84] J. Walker, K. Marino, H. Mulam, and M. Hebert, \The pose knows: Video
    forecasting by generating pose futures," pp. 3352{3361, 10 2017.
    [85] C. Lassner, G. Pons-Moll, and P. V. Gehler, \A generative model of people in
    clothing," in Proceedings of the IEEE International Conference on Computer
    Vision, pp. 853{862, 2017.
    [86] Y. Li, C. Huang, and C. C. Loy, \Dense intrinsic appearance
    ow for human
    pose transfer," in Proceedings of the IEEE Conference on Computer Vision
    and Pattern Recognition, pp. 3693{3702, 2019.
    [87] S. Song, W. Zhang, J. Liu, and T. Mei, \Unsupervised person image generation
    with semantic parsing transformation," in Proceedings of the IEEE Conference
    on Computer Vision and Pattern Recognition, pp. 2357{2366, 2019.
    [88] A. Odena, V. Dumoulin, and C. Olah, \Deconvolution and checkerboard artifacts,"
    Distill, vol. 1, no. 10, p. e3, 2016.
    [89] J. Johnson, A. Alahi, and L. Fei-Fei, \Perceptual losses for real-time style
    transfer and super-resolution," in European Conference on Computer Vision,
    pp. 694{711, Springer, 2016.
    [90] T. M. Quan, D. G. Hildebrand, and W.-K. Jeong, \Fusionnet: A deep
    fully residual convolutional neural network for image segmentation in connectomics,"
    arXiv preprint arXiv:1612.05360, 2016.
    [91] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, \Realtime multi-person 2d pose
    estimation using part affinity fields," arXiv preprint arXiv:1611.08050, 2016.
    [92] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. P.
    Aitken, A. Tejani, J. Totz, Z.Wang, et al., \Photo-realistic single image superresolution
    using a generative adversarial network.," in CVPR, vol. 2, p. 4,
    2017.
    [93] Q. Chen and V. Koltun, \Photographic image synthesis with cascaded refinement networks," in IEEE International Conference on Computer Vision
    (ICCV), vol. 1, p. 3, 2017.
    [94] L. Gatys, A. S. Ecker, and M. Bethge, \Texture synthesis using convolutional
    neural networks," in Advances in Neural Information Processing Sys-
    tems, pp. 262{270, 2015.
    [95] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, \Deblurgan:
    Blind motion deblurring using conditional adversarial networks," arXiv
    preprint arXiv:1711.07064, 2017.
    [96] K. Simonyan and A. Zisserman, \Very deep convolutional networks for largescale
    image recognition," arXiv preprint arXiv:1409.1556, 2014.
    [97] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, \Imagenet:
    A large-scale hierarchical image database," in Computer Vision and Pattern
    Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248{255, Ieee, 2009.
    [98] X. Liang, K. Gong, X. Shen, and L. Lin, \Look into person: Joint body
    parsing & pose estimation network and a new benchmark," IEEE transactions
    on pattern analysis and machine intelligence, vol. 41, no. 4, pp. 871{885, 2018.
    [99] A. Ghosh, R. Zhang, P. K. Dokania, O. Wang, A. A. Efros, P. H. Torr, and
    E. Shechtman, \Interactive sketch & Fill: Multiclass sketch-to-image translation," in Proceedings of the IEEE International Conference on Computer
    Vision, pp. 1171{1180, 2019.
    [100] Y. Lu, S. Wu, Y.-W. Tai, and C.-K. Tang, \Image generation from sketch
    constraint using contextual gan," in Proceedings of the European Conference
    on Computer Vision (ECCV), pp. 205{220, 2018.
    [101] H. Winnemmoller, J. E. Kyprianidis, and S. C. Olsen, \Xdog: an extended
    di erence-of-gaussians compendium including advanced image stylization,"
    Computers & Graphics, vol. 36, no. 6, pp. 740{753, 2012.
    [102] P. Gravity, \Create filter gallery photocopy effect with single step in photoshop,"
    2016.
    [103] Y. Lu, S. Wu, Y.-W. Tai, and C.-K. Tang, \Sketch-to-image generation using
    deep contextual completion," arXiv preprint arXiv:1711.08972, 2017.
    [104] V. L. D. U. A. Vedaldi, \Instance normalization: The missing ingredient for
    fast stylization," arXiv preprint arXiv:1607.08022, 2016.
    [105] V. Nair and G. E. Hinton, \Rectified linear units improve restricted boltzmann
    machines," in Proceedings of the 27th international conference on machine
    learning (ICML-10), pp. 807{814, 2010.
    [106] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov,
    \Dropout: a simple way to prevent neural networks from overfitting," The
    Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929{1958, 2014.
    [107] B. Xu, N. Wang, T. Chen, and M. Li, \Empirical evaluation of rectified activations
    in convolutional network," arXiv preprint arXiv:1505.00853, 2015.
    [108] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, \Image quality
    assessment: from error visibility to structural similarity," IEEE transactions
    on image processing, vol. 13, no. 4, pp. 600{612, 2004.
    [109] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, \Gans
    trained by a two time-scale update rule converge to a local nash equilibrium,"
    in Advances in Neural Information Processing Systems, pp. 6626{6637, 2017.

    無法下載圖示 全文公開日期 2025/07/05 (校內網路)
    全文公開日期 2025/07/05 (校外網路)
    全文公開日期 2025/07/05 (國家圖書館:臺灣博碩士論文系統)
    QR CODE