簡易檢索 / 詳目顯示

研究生: 胡氏妝
Trang-Thi Ho
論文名稱: 基於深度神經網路之影像分類與轉換
Image Classification and Transformation via Deep Neural Network
指導教授: 花凱龍
Kai-Lung Hua
口試委員: 陳永耀
Yung-Yao Chen
陸敬互
Ching-Hu Lu
林鼎然
Ting-Lan Lin
郭彥甫
Yan-Fu Kuo
花凱龍
Kai-Lung Hua
學位類別: 博士
Doctor
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 101
中文關鍵詞: Image classificationmulti-scale pyramidMarkov random fieldsconvolutional neural networkGenerative adversarial networksImage synthesisSemantic keypointsConvolutional autoencoder
外文關鍵詞: Image classification, multi-scale pyramid, Markov random fields, convolutional neural network, Generative adversarial networks, Image synthesis, Semantic keypoints, Convolutional autoencoder
相關次數: 點閱:325下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

As the main media of human communication and understanding of the world, image is one of the important information sources of human intelligence activities. In recent years, deep learning techniques have been implemented in various multimedia applications. Many deep learning based systems have been built to enhance the image classification performance and transformation. However, specific image classification and transformation tasks still remain a grand challenge in real world settings. This study proposes novel approaches that focus on artist-based painting classification and sketch-guided deep portrait generation using deep neural networks.
In terms of artist-based painting classification, we propose multi-scale networks which process one image both globally and locally and create more samples from one image to achieve accurate prediction. Using a multi-scale pyramid representation improves the understanding of characteristics and styles of painting images of different artists. We further demonstrate how Markov random fields, an unsupervised method that does not require additional labeled data, can be employed to build the label relationship between local image patches. In addition, we introduce a new entropy-based exponential fusion scheme to aggregate models outputs and demonstrate the superior performance of the proposed method using two challenging painting datasets.
On the topic of sketch-guided deep portrait generation, we aims to present a novel framework that generates realistic human body images from sketches and takes into account the structure that is unique to the human class. Our framework takes in a sketch of a human body and outputs a realistic human image having a similar pose with the corresponding sketch. Naturally, the problem that our image synthesis network is trying to solve can also be phrased as an image-to-image translation problem where we translate sketches to photo-realistic images. The main idea is to use semantic keypoints corresponding to the coordinates of 18 essential human body parts (e.g., noise, chest, shoulder, elbow, wrist, knee, ankle, eyes,ears, etc.) as a prior for human sketch-image synthesis. Our input sketches are very rough and often lack fine details and colorings. Therefore it is difficult to extract keypoints from sketches. To obtain semantic keypoints from these rough sketches, we first generate initial images by adopting a fully residual convolutional neural network. From these initial images, the pose estimator can easily extract the semantic keypoints. In addition, we formulate a GAN based model with several loss terms that adopts the nearest neighbor-resize convolution layers for upsampling which help network avoid the unwanted chekerboard patterns. Moreover, we further manipulate the color in different parts of human body images by estimating the human segmentation through Resnet-101 and atrous spatial pyramid pooling
(ASPP). We then transfer the output images into the HSL colorspace and further control the colors of the selected part of the human body in our generated images. Through various evaluation methods using 6300 sketch-image pairs, we verify that our proposed method compares favorably against state-of-the-art approaches.


As the main media of human communication and understanding of the world, image is one of the important information sources of human intelligence activities. In recent years, deep learning techniques have been implemented in various multimedia applications. Many deep learning based systems have been built to enhance the image classification performance and transformation. However, specific image classification and transformation tasks still remain a grand challenge in real world settings. This study proposes novel approaches that focus on artist-based painting classification and sketch-guided deep portrait generation using deep neural networks.
In terms of artist-based painting classification, we propose multi-scale networks which process one image both globally and locally and create more samples from one image to achieve accurate prediction. Using a multi-scale pyramid representation improves the understanding of characteristics and styles of painting images of different artists. We further demonstrate how Markov random fields, an unsupervised method that does not require additional labeled data, can be employed to build the label relationship between local image patches. In addition, we introduce a new entropy-based exponential fusion scheme to aggregate models outputs and demonstrate the superior performance of the proposed method using two challenging painting datasets.
On the topic of sketch-guided deep portrait generation, we aims to present a novel framework that generates realistic human body images from sketches and takes into account the structure that is unique to the human class. Our framework takes in a sketch of a human body and outputs a realistic human image having a similar pose with the corresponding sketch. Naturally, the problem that our image synthesis network is trying to solve can also be phrased as an image-to-image translation problem where we translate sketches to photo-realistic images. The main idea is to use semantic keypoints corresponding to the coordinates of 18 essential human body parts (e.g., noise, chest, shoulder, elbow, wrist, knee, ankle, eyes,ears, etc.) as a prior for human sketch-image synthesis. Our input sketches are very rough and often lack fine details and colorings. Therefore it is difficult to extract keypoints from sketches. To obtain semantic keypoints from these rough sketches, we first generate initial images by adopting a fully residual convolutional neural network. From these initial images, the pose estimator can easily extract the semantic keypoints. In addition, we formulate a GAN based model with several loss terms that adopts the nearest neighbor-resize convolution layers for upsampling which help network avoid the unwanted chekerboard patterns. Moreover, we further manipulate the color in different parts of human body images by estimating the human segmentation through Resnet-101 and atrous spatial pyramid pooling
(ASPP). We then transfer the output images into the HSL colorspace and further control the colors of the selected part of the human body in our generated images. Through various evaluation methods using 6300 sketch-image pairs, we verify that our proposed method compares favorably against state-of-the-art approaches.

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background on image processing . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions of the dissertation . . . . . . . . . . . . . . . . . . . . . 2 1.3 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Artist-based painting classification using Markov Random Fields with Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Multi-scale Networks . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.2 Decision Refinement using Markov Random Fields . . . . . . . 11 2.3.3 Entropy-based exponential fusion scheme . . . . . . . . . . . . 12 2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.2 Ablation analysis . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.3 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 Sketch-Guided Deep Portrait Generation . . . . . . . . . . . . . . . . . . . 36 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3.1 Semantic keypoint extraction . . . . . . . . . . . . . . . . . . 41 3.3.2 Coarse image generation . . . . . . . . . . . . . . . . . . . . . 42 3.3.3 Image refinement . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.4.1 Dataset Generation . . . . . . . . . . . . . . . . . . . . . . . . 47 3.4.2 Implementation details . . . . . . . . . . . . . . . . . . . . . . 48 3.4.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.4 Comparison with the state-of-the-art . . . . . . . . . . . . . . 55 3.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4 Conclusion and Future Directions . . . . . . . . . . . . . . . . . . . . . . . 60 4.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . 62 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 A Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

[1] R. Zhang and B. Xin, \A review of woven fabric pattern recognition based on
image processing technology," Research Journal of Textile and Apparel, 2016.
[2] G. Zhang and B. Xin, \An overview of the application of image processing
technology for yarn hairiness evaluation," Research Journal of Textile and
Apparel, 2016.
[3] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, \Image-to-image translation
with conditional adversarial networks," in Proceedings of the IEEE conference
on computer vision and pattern recognition, pp. 1125{1134, 2017.
[4] G. Wang, D. Hoiem, and D. Forsyth, \Building text features for object image
classification," in 2009 IEEE conference on computer vision and pattern
recognition, pp. 1367{1374, IEEE, 2009.
[5] A. Krizhevsky, I. Sutskever, and G. Hinton, \Imagenet classification with deep
convolutional neural networks," Advances in Neural Information Processing
Systems 25, pp. 1097{1105, 2012.
[6] S.-h. Zhong, Y. Liu, and Y. Liu, \Bilinear deep learning for image classification,"
in Proceedings of the 19th ACM international conference on Multimedia,
pp. 343{352, ACM, 2011.
[7] S.-H. Zhong, Y. Liu, and K. A. Hua, \Field effect deep networks for image
recognition with incomplete data," ACM Transactions on Multimedia Com-
puting, Communications, and Applications (TOMM), vol. 12, no. 4, p. 52,
2016.
[8] J. Ke, Y. Peng, S. Liu, Z. Sun, and X. Wang, \A novel grouped sparse representation
for face recognition," Multimedia Tools and Applications, vol. 78,
no. 6, pp. 7667{7689, 2019.
[9] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, \Learning transferable architectures
for scalable image recognition," in Proceedings of the IEEE conference
on computer vision and pattern recognition, pp. 8697{8710, 2018.
[10] M. L. Cordero-Maldonado, S. Perathoner, K.-J. Van Der Kolk, R. Boland,
U. Heins-Marroquin, H. P. Spaink, A. H. Meijer, A. D. Crawford, and
J. De Sonneville, \Deep learning image recognition enables efficient genome
editing in zebrafish by automated injections," PloS one, vol. 14, no. 1,
p. e0202377, 2019.
[11] F. Feng, X. Wang, and R. Li, \Cross-modal retrieval with correspondence
autoencoder," in Proceedings of the 22nd ACM international conference on
Multimedia, pp. 7{16, ACM, 2014.
[12] W. Wang, B. C. Ooi, X. Yang, D. Zhang, and Y. Zhuang, \Effective multimodal
retrieval based on stacked auto-encoders," Proceedings of the VLDB
Endowment, vol. 7, no. 8, pp. 649{660, 2014.
[13] W. Wang, G. Chen, H. Chen, T. T. A. Dinh, J. Gao, B. C. Ooi, K.-L.
Tan, S. Wang, and M. Zhang, \Deep learning at scale and at ease," ACM
Transactions on Multimedia Computing, Communications, and Applications
(TOMM), vol. 12, no. 4s, p. 69, 2016.
[14] L. Zhang, S. Wang, and B. Liu, \Deep learning for sentiment analysis: A survey,"
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery,
vol. 8, no. 4, p. e1253, 2018.
[15] J. Guo, B. Song, P. Zhang, M. Ma, W. Luo, et al., \Affective video content
analysis based on multimodal data fusion in heterogeneous networks," Infor-
mation Fusion, vol. 51, pp. 224{232, 2019.
[16] I. Katib and D. Medhi, \A study on layer correlation effects through a multilayer
network optimization problem," in Proceedings of the 23rd International
Teletraffic Congress, pp. 31{38, International Teletraffic Congress, 2011.
[17] Y. Guo, Y. Liu, E. M. Bakker, Y. Guo, and M. S. Lew, \Cnn-rnn: a large-scale
hierarchical image classi cation framework," Multimedia Tools and Applica-
tions, vol. 77, no. 8, pp. 10251{10271, 2018.
[18] X. Yang, Y. Ye, X. Li, R. Y. Lau, X. Zhang, and X. Huang, \Hyperspectral
image classification with deep learning models," IEEE Transactions on
Geoscience and Remote Sensing, vol. 56, no. 9, pp. 5408{5423, 2018.
[19] S. Khan, N. Islam, Z. Jan, I. U. Din, and J. J. C. Rodrigues, \A novel deep
learning based framework for the detection and classification of breast cancer
using transfer learning," Pattern Recognition Letters, vol. 125, pp. 1{6, 2019.
[20] S.-h. Zhong, X. Huang, and Z. Xiao, \Fine-art painting classification via twochannel
dual path networks," International Journal of Machine Learning and
Cybernetics, vol. 11, no. 1, pp. 137{152, 2020.
[21] M. O. Kelek, N. Calik, and T. Yildirim, \Painter classification over the novel
art painting data set via the latest deep neural networks," Procedia Comput
Sci, vol. 154, pp. 369{376, 2019.
[22] C. Sandoval, E. Pirogova, and M. Lech, \Two-stage deep learning approach to
the classification of fine-art paintings," IEEE Access, vol. 7, pp. 41770{41781,
2019.
[23] WikiArt, \WikiArt the online home for visual arts from all around the world.,"
2016.
[24] Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and
L. Jackel, \Backpropagation applied to hand-written zip code recognition,"
Neural Computing, vol. 1, no. 4, pp. 541{551, 1989.
[25] G. Kalliatakis, S. Ehsan, A. Leonardis, M. Fasli, and K. D. McDonald-Maier,
\Exploring object-centric and scene-centric cnn features and their complementarity
for human rights violations recognition in images," IEEE Access, vol. 7,
pp. 10045{10056, 2019.
[26] P. Li, L. Zhao, D. Xu, and D. Lu, \Optimal transport of deep feature for
image style transfer," in Proceedings of the 2019 4th International Conference
on Multimedia Systems and Signal Processing, pp. 167{171, 2019.
[27] Y.-T. Chang, W.-H. Cheng, B. Wu, and K.-L. Hua, \Fashion world map:
Understanding cities through streetwear fashion," pp. 91{99, ACM, 2017.
[28] Z. Qiu, F. Yan, Y. Zhuang, and H. Leung, \Outdoor semantic segmentation for
ugvs based on cnn and fully connected crfs," IEEE Sensors Journal, vol. 19,
no. 11, pp. 4290{4298, 2019.
[29] J. Sanchez-Riera, K. Srinivasan, K.-L. Hua, W.-H. Cheng, M. A. Hossain,
and M. F. Alhamid, \Robust rgb-d hand tracking using deep learning priors,"
IEEE Transactions on Circuits and Systems for Video Technology, vol. 28,
no. 9, pp. 2289{2301, 2017.
[30] Y. Che, Y. Song, and Y. Qi, \A novel framework of hand localization and
hand pose estimation," in ICASSP 2019-2019 IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP), pp. 2222{2226, IEEE,
2019.
[31] K.-L. Hua, C.-H. Hsu, S. C. Hidayati, W.-H. Cheng, and Y.-J. Chen,
\Computer-aided classification of lung nodules on computed tomography images
via deep learning technique," OncoTargets and therapy, vol. 8, 2015.
[32] P. Sudharshan, C. Petitjean, F. Spanhol, L. E. Oliveira, L. Heutte, and
P. Honeine, \Multiple instance learning for histopathological breast cancer
image classification," Expert Systems with Applications, vol. 117, pp. 103{111,
2019.
[33] J. Y. Lee, \Deep learning ensemble with data augmentation using a transcoder
in visual description," Multimedia Tools and Applications, vol. 78, no. 22,
pp. 31231{31243, 2019.
[34] S. Kumar, A. Tyagi, T. Sahu, P. Shukla, and A. Mittal, \Indian art form
recognition using convolutional neural networks," in 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 800{
804, IEEE, 2018.
[35] K. A. Jangtjik, M.-C. Yeh, and K.-L. Hua, \Artist-based classification via
deep learning with multi-scale weighted pooling," in Proceedings of the 2016
ACM on Multimedia Conference, MM '16, (New York, NY, USA), pp. 635{
639, ACM, 2016.
[36] Q. Hanchao and H. Shannon, \A new method for visual stylometry on impresionist
paintings," IEEE International Conference on Acoustics, Speech and
Signal Processing, pp. 2036{2039, 2011.
[37] M. Sun, D. Zhang, J. Ren, Z.Wang, and J. S.Ji, \Brushstroke based sparse hybrid
convolutional neural networks for author classification of chinese ink-wash
paintings," IEEE International Conference on Image Processing, pp. 626{630,
2015.
[38] S. Zhao, H. Yao, X. Jiang, and X. Sun, \Predicting discrete probability distribution
of image emotion," IEEE International Conference on Image Process-
ing, pp. 2459{2463, 2015.
[39] K. Peng and T. Chen, \Cross-layer features in convolutional neural networks
for generic classification tasks," International Conference in Image Processing,
pp. 3057{3061, 2015.
[40] W. R. Tan, C. S. Chan, H. E. Aguirre, and K. Tanaka, \Ceci n'est pas une
pipe: A deep convolutional network for fine-art paintings classification," in
Image Processing (ICIP), 2016 IEEE International Conference on, pp. 3703{
3707, IEEE, 2016.
[41] S. J. Pan and Q. Yang, \A survey on transfer learning," IEEE Transactions
on knowledge and data engineering, vol. 22, no. 10, pp. 1345{1359, 2010.
[42] K. A. Jangtjik, T.-H. Trang, M.-C. Yeh, and K.-L. Hua, \A cnn-lstm framework
for authorship classification of paintings," IEEE International Conference on Image Processing, pp. 2866{2870, 2017.
[43] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael, \Learning low-level
vision," International journal of computer vision, vol. 40, no. 1, pp. 25{47,
2000.
[44] Z. Liu, X. Li, P. Luo, C. C. Loy, and X. Tang, \Deep learning markov random
field for semantic segmentation," IEEE transactions on pattern analysis and
machine intelligence, vol. 40, no. 8, pp. 1814{1828, 2017.
[45] J. M. Hammersley and P. Clifford, \Markov fields on finite graphs and lattices,"
Unpublished manuscript, vol. 46, 1971.
[46] P. Perez et al., Markov random fields and images. IRISA, 1998.
[47] J. Lu, D. Min, R. S. Pahwa, and M. N. Do, \A revisit to mrf-based depth map
super-resolution and enhancement," in Acoustics, Speech and Signal Process-
ing (ICASSP), 2011 IEEE International Conference on, pp. 985{988, IEEE,
2011.
[48] D. Kim and K.-j. Yoon, \High-quality depth map up-sampling robust to edge
noise of range sensors," in Image Processing (ICIP), 2012 19th IEEE Inter-
national Conference on, pp. 553{556, IEEE, 2012.
[49] K.-H. Lo, K.-L. Hua, and Y.-C. F. Wang, \Depth map super-resolution via
markov random fields without texture-copying artifacts," International Con-
ference on Acoustics, Speech and Signal Processing, pp. 1414 { 1418, 2013.
[50] PaintingDb, \PaintingDb fastest growing art gallery in the web," 2015.
[51] T. Chen, M.-M. Cheng, P. Tan, A. Shamir, and S.-M. Hu, \Sketch2photo:
Internet image montage," in ACM Transactions on Graphics (TOG), vol. 28,
p. 124, ACM, 2009.
[52] M. Eitz, R. Richter, K. Hildebrand, T. Boubekeur, and M. Alexa, \Photosketcher:
interactive sketch-based image synthesis," IEEE Computer Graphics
and Applications, vol. 31, no. 6, pp. 56{66, 2011.
[53] K. He, X. Zhang, S. Ren, and J. Sun, \Deep residual learning for image
recognition," in Proceedings of the IEEE conference on computer vision and
pattern recognition, pp. 770{778, 2016.
[54] A. Krizhevsky, I. Sutskever, and G. E. Hinton, \Imagenet classification with
deep convolutional neural networks," in Advances in neural information pro-
cessing systems, pp. 1097{1105, 2012.
[55] Y. LeCun, Y. Bengio, and G. Hinton, \Deep learning," nature, vol. 521,
no. 7553, p. 436, 2015.
[56] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, and Y. Bengio, \Generative adversarial nets," in Advances in
neural information processing systems, pp. 2672{2680, 2014.
[57] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, \Unpaired image-to-image translation
using cycle-consistent adversarial networks," in Proceedings of the IEEE
international conference on computer vision, pp. 2223{2232, 2017.
[58] W. Chen and J. Hays, \Sketchygan: Towards diverse and realistic sketch to
image synthesis," in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pp. 9416{9425, 2018.
[59] Y. Peng and J. Qi, \Cm-gans: cross-modal generative adversarial networks for
common representation learning," ACM Transactions on Multimedia Comput-
ing, Communications, and Applications (TOMM), vol. 15, no. 1, p. 22, 2019.
[60] Y. Lu, S. Wu, Y.-W. Tai, and C.-K. Tang, \Sketch-to-image generation using
deep contextual completion," ArXiv, vol. abs/1711.08972, 2017.
[61] Y. Jo and J. Park, \Sc-fegan: Face editing generative adversarial network with
user's sketch and color," in Proceedings of the IEEE International Conference
on Computer Vision, pp. 1745{1753, 2019.
[62] B. Yang, X. Chen, R. Hong, Z. Chen, Y. Li, and Z.-J. Zha, \Joint sketchattribute
learning for fine-grained face synthesis," in International Conference
on Multimedia Modeling, pp. 790{801, Springer, 2020.
[63] S. Xie and Z. Tu, \Holistically-nested edge detection," in Proceedings of the
IEEE international conference on computer vision, pp. 1395{1403, 2015.
[64] E. Simo-Serra, S. Iizuka, K. Sasaki, and H. Ishikawa, \Learning to simplify:
fully convolutional networks for rough sketch cleanup," ACM Transactions on
Graphics (TOG), vol. 35, no. 4, p. 121, 2016.
[65] Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang, \Deepfashion: Powering robust
clothes recognition and retrieval with rich annotations," in Proceedings of the
IEEE conference on computer vision and pattern recognition, pp. 1096{1104,
2016.
[66] Y. Cao, C. Wang, L. Zhang, and L. Zhang, \Edgel index for large-scale sketchbased
image search," 2011.
[67] Y. Cao, H. Wang, C. Wang, Z. Li, L. Zhang, and L. Zhang, \Mindfinder:
interactive sketch-based image search on millions of images," in Proceedings of
the 18th ACM international conference on Multimedia, pp. 1605{1608, ACM,
2010.
[68] M. Eitz, K. Hildebrand, T. Boubekeur, and M. Alexa, \An evaluation of descriptors
for large-scale image retrieval from sketched feature lines," Comput-
ers & Graphics, vol. 34, no. 5, pp. 482{498, 2010.
[69] M. Eitz, K. Hildebrand, T. Boubekeur, and M. Alexa, \Sketch-based image
retrieval: Benchmark and bag-of-features descriptors," IEEE transactions on
visualization and computer graphics, vol. 17, no. 11, pp. 1624{1636, 2011.
[70] R. Hu, M. Barnard, and J. Collomosse, \Gradient field descriptor for sketch
based retrieval and localization," in Image Processing (ICIP), 2010 17th IEEE
International Conference on, pp. 1025{1028, IEEE, 2010.
[71] R. Hu and J. Collomosse, \A performance evaluation of gradient field hog
descriptor for sketch based image retrieval," Computer Vision and Image Un-
derstanding, vol. 117, no. 7, pp. 790{806, 2013.
[72] R. Hu, T. Wang, and J. Collomosse, \A bag-of-regions approach to sketchbased
image retrieval," in Image Processing (ICIP), 2011 18th IEEE Interna-
tional Conference on, pp. 3661{3664, IEEE, 2011.
[73] S. James, M. J. Fonseca, and J. Collomosse, \Reenact: Sketch based choreographic
design from archival dance footage," in Proceedings of International
Conference on Multimedia Retrieval, p. 313, ACM, 2014.
[74] K. Li, K. Pang, Y.-Z. Song, T. Hospedales, H. Zhang, and Y. Hu, \Finegrained
sketch-based image retrieval: The role of part-aware attributes," in
Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on,
pp. 1{9, IEEE, 2016.
[75] Y.-L. Lin, C.-Y. Huang, H.-J. Wang, and W. Hsu, \3d sub-query expansion
for improving sketch-based multi-view image retrieval," in Proceedings of the
IEEE International Conference on Computer Vision, pp. 3495{3502, 2013.
[76] C. Wang, Z. Li, and L. Zhang, \Mindfinder: image search by interactive
sketching and tagging," in Proceedings of the 19th international conference
on World wide web, pp. 1309{1312, ACM, 2010.
[77] F. Wang, L. Kang, and Y. Li, \Sketch-based 3d shape retrieval using convolutional
neural networks," in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 1875{1883, 2015.
[78] J.-Y. Zhu, Y. J. Lee, and A. A. Efros, \Averageexplorer: Interactive exploration
and alignment of visual data collections," ACM Transactions on Graph-
ics (TOG), vol. 33, no. 4, p. 160, 2014.
[79] Q. Yu, F. Liu, Y.-Z. Song, T. Xiang, T. M. Hospedales, and C.-C. Loy, \Sketch
me that shoe," in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pp. 799{807, 2016.
[80] D. Berthelot, T. Schumm, and L. Metz, \Began: boundary equilibrium generative
adversarial networks," arXiv preprint arXiv:1703.10717, 2017.
[81] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville,
\Improved training of wasserstein gans," in Advances in Neural Information
Processing Systems, pp. 5767{5777, 2017.
[82] W. Sun, F. Liu, and W. Xu, \Unlabeled samples generated by gan improve the
person re-identification baseline," in Proceedings of the 2019 5th International
Conference on Computer and Technology Applications, pp. 117{123, ACM,
2019.
[83] C. Si, W. Wang, L. Wang, and T. Tan, \Multistage adversarial losses for posebased
human image synthesis," in 2018 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pp. 118{126, 2018.
[84] J. Walker, K. Marino, H. Mulam, and M. Hebert, \The pose knows: Video
forecasting by generating pose futures," pp. 3352{3361, 10 2017.
[85] C. Lassner, G. Pons-Moll, and P. V. Gehler, \A generative model of people in
clothing," in Proceedings of the IEEE International Conference on Computer
Vision, pp. 853{862, 2017.
[86] Y. Li, C. Huang, and C. C. Loy, \Dense intrinsic appearance
ow for human
pose transfer," in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pp. 3693{3702, 2019.
[87] S. Song, W. Zhang, J. Liu, and T. Mei, \Unsupervised person image generation
with semantic parsing transformation," in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 2357{2366, 2019.
[88] A. Odena, V. Dumoulin, and C. Olah, \Deconvolution and checkerboard artifacts,"
Distill, vol. 1, no. 10, p. e3, 2016.
[89] J. Johnson, A. Alahi, and L. Fei-Fei, \Perceptual losses for real-time style
transfer and super-resolution," in European Conference on Computer Vision,
pp. 694{711, Springer, 2016.
[90] T. M. Quan, D. G. Hildebrand, and W.-K. Jeong, \Fusionnet: A deep
fully residual convolutional neural network for image segmentation in connectomics,"
arXiv preprint arXiv:1612.05360, 2016.
[91] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, \Realtime multi-person 2d pose
estimation using part affinity fields," arXiv preprint arXiv:1611.08050, 2016.
[92] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. P.
Aitken, A. Tejani, J. Totz, Z.Wang, et al., \Photo-realistic single image superresolution
using a generative adversarial network.," in CVPR, vol. 2, p. 4,
2017.
[93] Q. Chen and V. Koltun, \Photographic image synthesis with cascaded refinement networks," in IEEE International Conference on Computer Vision
(ICCV), vol. 1, p. 3, 2017.
[94] L. Gatys, A. S. Ecker, and M. Bethge, \Texture synthesis using convolutional
neural networks," in Advances in Neural Information Processing Sys-
tems, pp. 262{270, 2015.
[95] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, \Deblurgan:
Blind motion deblurring using conditional adversarial networks," arXiv
preprint arXiv:1711.07064, 2017.
[96] K. Simonyan and A. Zisserman, \Very deep convolutional networks for largescale
image recognition," arXiv preprint arXiv:1409.1556, 2014.
[97] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, \Imagenet:
A large-scale hierarchical image database," in Computer Vision and Pattern
Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248{255, Ieee, 2009.
[98] X. Liang, K. Gong, X. Shen, and L. Lin, \Look into person: Joint body
parsing & pose estimation network and a new benchmark," IEEE transactions
on pattern analysis and machine intelligence, vol. 41, no. 4, pp. 871{885, 2018.
[99] A. Ghosh, R. Zhang, P. K. Dokania, O. Wang, A. A. Efros, P. H. Torr, and
E. Shechtman, \Interactive sketch & Fill: Multiclass sketch-to-image translation," in Proceedings of the IEEE International Conference on Computer
Vision, pp. 1171{1180, 2019.
[100] Y. Lu, S. Wu, Y.-W. Tai, and C.-K. Tang, \Image generation from sketch
constraint using contextual gan," in Proceedings of the European Conference
on Computer Vision (ECCV), pp. 205{220, 2018.
[101] H. Winnemmoller, J. E. Kyprianidis, and S. C. Olsen, \Xdog: an extended
di erence-of-gaussians compendium including advanced image stylization,"
Computers & Graphics, vol. 36, no. 6, pp. 740{753, 2012.
[102] P. Gravity, \Create filter gallery photocopy effect with single step in photoshop,"
2016.
[103] Y. Lu, S. Wu, Y.-W. Tai, and C.-K. Tang, \Sketch-to-image generation using
deep contextual completion," arXiv preprint arXiv:1711.08972, 2017.
[104] V. L. D. U. A. Vedaldi, \Instance normalization: The missing ingredient for
fast stylization," arXiv preprint arXiv:1607.08022, 2016.
[105] V. Nair and G. E. Hinton, \Rectified linear units improve restricted boltzmann
machines," in Proceedings of the 27th international conference on machine
learning (ICML-10), pp. 807{814, 2010.
[106] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov,
\Dropout: a simple way to prevent neural networks from overfitting," The
Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929{1958, 2014.
[107] B. Xu, N. Wang, T. Chen, and M. Li, \Empirical evaluation of rectified activations
in convolutional network," arXiv preprint arXiv:1505.00853, 2015.
[108] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, \Image quality
assessment: from error visibility to structural similarity," IEEE transactions
on image processing, vol. 13, no. 4, pp. 600{612, 2004.
[109] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, \Gans
trained by a two time-scale update rule converge to a local nash equilibrium,"
in Advances in Neural Information Processing Systems, pp. 6626{6637, 2017.

無法下載圖示 全文公開日期 2025/07/05 (校內網路)
全文公開日期 2025/07/05 (校外網路)
全文公開日期 2025/07/05 (國家圖書館:臺灣博碩士論文系統)
QR CODE