簡易檢索 / 詳目顯示

研究生: 林嘉清
Daivalentineno Janitra Salim
論文名稱: 每個人都是鑑識畫家:草圖到照片的人臉轉換
Everyone Is a Forensic Artist: Sketch-to-Photo Transformation for Human Face
指導教授: 林伯慎
Bor-Shen Lin
口試委員: 羅乃維
Nai-Wei Lo
楊傳凱
Chuan-Kai Yang
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 145
中文關鍵詞: Forensic GAN圖像生成素描到照片的轉換照片處理多重屬性轉換犯罪鑑識犯罪調查
外文關鍵詞: Forensic GAN, sketch-to-photo transformation, StarGAN, image manipulation, forensic artist, criminal investigation
相關次數: 點閱:260下載:18
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

全世界每天都有很多犯罪發生,這些犯罪行為有殺人、搶劫、強姦、暴力攻擊、恐怖襲擊等形式。在犯罪行為中,兇殺的犯罪形式有大量受害者。有時在犯罪時會有目擊者,他們可能認得出罪犯的臉;因此,警方經常會繪製並公布畫像,以找出犯罪嫌疑人。人臉是人體中最顯著和最有信息的部分,可以被識別並可靠地確認一個人的身份;這也是為什麼執法機構廣泛運用人臉識別來識別身份或逮捕罪犯。由於許多人有具有記住人臉的能力,因此通常可以繪製出犯罪嫌疑人的素描。如果能進一步將素描轉換為擬真彩色照片、並修改其面部屬性特徵(像是不同髮色或戴眼鏡),將會有助於尋找嫌疑人和加速犯罪調查。
近年有些研究已能夠將人臉的素描轉換為擬真的圖像,以及根據設定的臉部特徵來改變人臉的特徵樣式(例如頭髮的顏色或戴眼鏡)。但是,目前還沒有一種能夠直接將面部素描轉換為擬真照片並更改特徵的整合架構。為了解決此限制,我們提出了Forensic GAN,它結合了Cycle GAN和Star GAN,可以將人臉素描轉換成擬真照片,並能根據多重屬性更改人臉特徵。我們也提出了一種投票機制和一種對數效能度量,用來結合PSNR,SSIM,SCC和ERGAS等度量,以對生成圖像的品質做整體評估。我們分別考慮了六個影響學習的因素,包括數據集,損失函數,數據選擇,訓練策略,訓練圖像數量和epoch數,以優化圖像合成效能。實驗結果顯示, Forensic GAN可以將人臉素描轉換成品質不錯的擬真彩色照片,還可同時更改臉部多重屬性特徵,並獲得了與單用Star GAN相近的屬性偵測平均正確率。


There are a lot of crimes happening all over the world every day. The crimes are committed in form of homicide, robbery, rape, oppression, terrorist attack, and so on. Among the criminal acts, homicide is the type of crime with a large number of victims. There are occasionally eyewitnesses who see the incident and may recognize the criminal’s face, so the police often draw paintings to find out the suspects. Human face is the most significant and informative part of the human body, and can be recognized so as to identify a person with high certainty. This is why the recognition of the human face is widely used by law enforcement agencies to identify or arrest criminals. Since people have the capabilities of remembering human faces, it is in general possible to produce a sketch of the suspect. If the sketch could be further converted into photo-realistic images with modified facial attributes, it may help to find out the suspects and accelerate the investigation process.
Some researches have been shown to be able to transform the sketch of human face into a photo-realistic image, or to change the style of the human face according to desired facial features, such as the color of the hair or with glasses. However, there has not yet been an integrated architecture that is able to transform facial sketch to photo-realistic image directly based on multiple attributes. To address this limitation, we propose Forensic GAN, an architecture that integrates CycleGAN and Star GAN, to perform sketch-to-photo transformation and image manipulation according to multiple attributes. A voting mechanism and a metric of log-sum are proposed to combine four metrics, PSNR, SSIM, SCC and ERGAS, for overall evaluation of image quality. Six factors, including datasets, loss function, data selection, training strategy, number of training images, and number of epochs, are considered respectively for optimizing the synthesis performance. The proposed Forensic GAN may transform multiple attributes at a time to obtain the facial photo with modified attributes, and obtain the mean accuracy of attribute detection that is compatible with Star GAN.

摘要 iv Abstract v Acknowledgement vi Content of Table x Content of Figure xiii List of Algorithms xix Chapter 1 - Introduction 1 1.1. Background 1 1.2. Motivation 3 1.3. Contribution 5 1.4. Summary 6 Chapter 2 - Literature Review 8 2.1. Image to Image Translation 8 2.1.1. Face Photo – Sketch Synthesis 9 2.1.2. Face Image Manipulation 9 2.1.3. Facial Expression Synthesis 9 2.2. Generative Adversarial Networks 10 2.2.1. Generative Adversarial Networks 10 2.2.2. Conditional Generative Adversarial Networks 11 2.2.3. Cycle Generative Adversarial Networks 12 2.2.4. Star Generative Adversarial Networks 17 2.3. Improvement of Generative Adversarial Network 26 2.3.1. Wasserstein Generative Adversarial Network 26 2.3.2. Wasserstein Generative Adversarial Network with Gradient Penalty 27 2.3.3. Deep Regret Analytics Generative Adversarial Network 29 2.4. Xception: Deep Learning with Depthwise Separable Convolutions 30 2.5. Validation Measurement 32 2.6. Summary 39 Chapter 3 - Pencil Sketch Synthesis 41 3.1. Architecture 41 3.2. Dataset 42 3.3. Implementation, Results, and Validation 43 3.4. Summary 53 Chapter 4 - Synthesis of Forensic Images from Sketch 54 4.1. Pencil-Sketch-Like Images to Photo Realistic 54 4.1.1. Determine the Best Dataset and Loss Function 58 4.1.2. Determine the Need for Additional Loss Function. 64 4.1.3. Determine the Number of Training Images. 64 4.2. Star GAN using Deep Regret Analytics 68 4.2.1. Architecture 68 4.2.2. Analysis of Loss Function 70 4.2.3. Implementation, Results, and Validation 73 4.2.4. Investigating the Use of Deep Regret Analytics in Star GAN 80 4.3. Forensic GAN 84 4.3.1. Architecture 84 4.3.2. Analysis of Loss Function 84 4.3.3. Implementation, Results, and Validation 87 4.4. Summary 101 Chapter 5 - Conclusion 104 Appendix 107 A. Transform CelebA RGB to Pencil Sketch Style using Open CV 107 B. Transform CelebA RGB to Pencil Sketch Style using Photoshop 109 References 120

[1] GBD 2017 Causes of Death Collaborators (2018). Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017. The Lancet, 392(10159), 1736-1788. https://doi.org/10.1016/S0140-6736(18)32203-7
[2] Saraiva, Renan & Castilho, Goiara & Nogueira, Raiane & Coelho, Letícia & Alarcão, Luciana & Lage, Jade. (2017). Eyewitnesses memory for faces in actual criminal cases: An archival analysis of positive facial composites. Estudos de Psicologia (Natal). 22. 247-256. 10.22491/1678-4669.20170025.
[3] Klum, Scott & Han, Hu & Klare, Brendan & Jain, Anil. (2014). The FaceSketchID System: Matching Facial Composites to Mugshots. IEEE Transactions on Information Forensics and Security.
[4] Frontiers. (2016, June 9). Witnesses can catch criminals by smell: Human nose-witnesses identify criminals in a lineup of body odor. ScienceDaily. Retrieved May 3, 2020 from www.sciencedaily.com/releases/2016/06/160609115120.htm
[5] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
[6] Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134).
[7] Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232).
[8] Yu, J., Xu, X., Gao, F., Shi, S., Wang, M., Tao, D., & Huang, Q. (2020). Toward Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs. IEEE Transactions on Cybernetics.
[9] Huang, J., Tan, M., Yan, Y., Qing, C., Wu, Q., & Yu, Z. (2018, November). Cartoon-to-Photo Facial Translation with Generative Adversarial Networks. In ACML (pp. 566-581).
[10] C. Peng, X. Gao, N. Wang, D. Tao, X. Li, and J. Li, “Multiple representations-based face sketch-photo synthesis,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 11, pp. 2201–2215, 2016.
[11] N. Wang, X. Gao, and J. Li, “Random sampling for fast face sketch synthesis,” Pattern Recognition (PR), 2017.
[12] L. Zhang, L. Lin, X. Wu, S. Ding, and L. Zhang, “End-to-end photo-sketch generation via fully convolutional representation learning,” in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015, pp. 627–634.
[13] N. Wang, W. Zha, J. Li, and X. Gao, “Back projection: an effective postprocessing method for gan-based face sketch synthesis,” Pattern Recognition Letters, pp. 1–7, 2017.
[14] Li, M., Zuo, W., & Zhang, D. (2016). Deep identity-aware transfer of facial attributes. arXiv preprint arXiv:1610.05586.
[15] Choi, Y., Choi, M., Kim, M., Ha, J. W., Kim, S., & Choo, J. (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8789-8797).
[16] Zhou, Y., & Shi, B. E. (2017, October). Photo-realistic facial expression synthesis by the conditional difference adversarial autoencoder. In 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII) (pp. 370-376). IEEE.
[17] Pumarola, A., Agudo, A., Martinez, A. M., Sanfeliu, A., & Moreno-Noguer, F. (2018). Ganimation: Anatomically-aware facial animation from a single image. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 818-833).
[18] Cai, J., Meng, Z., Khan, A. S., Li, Z., O'Reilly, J., & Tong, Y. (2019). Identity-free facial expression recognition using conditional generative adversarial network. arXiv preprint arXiv:1903.08051.
[19] Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein gan. arXiv preprint arXiv:1701.07875.
[20] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of wasserstein gans. In Advances in neural information processing systems (pp. 5767-5777).
[21] Kodali, N., Abernethy, J., Hays, J., & Kira, Z. (2017). On convergence and stability of gans. arXiv preprint arXiv:1705.07215.
[22] Johnson, J., Alahi, A., & Fei-Fei, L. (2016, October). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (pp. 694-711). Springer, Cham.
[23] Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
[24] Sara, Umme & Akter, Morium & Uddin, Mohammad. (2019). Image Quality Assessment through FSIM, SSIM, MSE and PSNR—A Comparative Study. Journal of Computer and Communications. 07. 8-18. 10.4236/jcc.2019.73002.
[25] Bodenham, Dean. “Adaptive Filtering and Change Detection for Streaming Data.” PH.D. Thesis. Imperial College, London, 2012.
[26] Wang, Zhou & Bovik, Alan. (2002). A Universal Image Quality Index. Signal Processing Letters, IEEE. 9. 81 - 84. 10.1109/97.995823.
[27] Panchal, Shaileshkumar. (2015). Implementation and Comparative Quantitative Assessment of Different Multispectral Image Pansharpening Approaches [. Signal & Image processing: An International Journal. 6. 35. 10.5121/sipij.2015.6503.
[28] Chen Y (2015) A New Methodology of Spatial Cross-Correlation Analysis. PLOS ONE 10(5): e0126158. https://doi.org/10.1371/journal.pone.0126158
[29] Rashmi, S., Addamani, S., & Ravikiran, A. (2014). Spectral Angle Mapper Algorithm for Remote Sensing Image Classification.
[30] Wang, Z. & Simoncelli, Eero & Bovik, Alan. (2003). Multiscale structural similarity for image quality assessment. Conference Record of the Asilomar Conference on Signals, Systems and Computers. 2. 1398 - 1402 Vol.2. 10.1109/ACSSC.2003.1292216.
[31] Qian Z., Wang W., Zhang X. (2012) Generalized Quality Assessment of Blocking Artifacts for Compressed Images. In: Zhang W., Yang X., Xu Z., An P., Liu Q., Lu Y. (eds) Advances on Digital Television and Wireless Multimedia Communications. Communications in Computer and Information Science, vol 331. Springer, Berlin, Heidelberg
[32] Chen, Y., Lai, Y. K., & Liu, Y. J. (2018). Cartoongan: Generative adversarial networks for photo cartoonization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9465-9474).
[33] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681-4690).
[34] Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D. N. (2017). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 5907-5915).
[35] Demir, U., & Unal, G. (2018). Patch-based image inpainting with generative adversarial networks. arXiv preprint arXiv:1803.07422.
[36] Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., & Yan, S. (2017). Perceptual generative adversarial networks for small object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1222-1230).
[37] Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Generating videos with scene dynamics. In Advances in neural information processing systems (pp. 613-621).
[38] Yang, L. C., Chou, S. Y., & Yang, Y. H. (2017). MidiNet: A convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847.
[39] Schlegl, T., Seeböck, P., Waldstein, S. M., Schmidt-Erfurth, U., & Langs, G. (2017, June). Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International conference on information processing in medical imaging (pp. 146-157). Springer, Cham.
[40] Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
[41] Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: ICML (2017)
[42] Zhu, S., Fidler, S., Urtasun, R., Lin, D., Loy, C.C.: Be your own prada: Fashion synthesis with structural coherence. In: ICCV (2017)
[43] Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., Metaxas, D.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: ICCV (2017)
[44] Qi, G. J. (2019). Loss-sensitive generative adversarial networks on lipschitz densities. International Journal of Computer Vision, 1-23.
[45] Zhang, D., Lin, L., Chen, T., Wu, X., Tan, W., & Izquierdo, E. (2016). Content-adaptive sketch portrait generation by decompositional representation learning. IEEE transactions on image processing, 26(1), 328-339.
[46] Zhang, Q., Liu, Z., Quo, G., Terzopoulos, D., & Shum, H. Y. (2005). Geometry-driven photorealistic facial expression synthesis. IEEE Transactions on Visualization and Computer Graphics, 12(1), 48-60.
[47] Xie, W., Shen, L., & Jiang, J. (2016). A novel transient wrinkle detection algorithm and its application for expression synthesis. IEEE Transactions on Multimedia, 19(2), 279-292.
[48] Thies, J., Zollhöfer, M., Nießner, M., Valgaerts, L., Stamminger, M., & Theobalt, C. (2015). Real-time expression transfer for facial reenactment. ACM Trans. Graph., 34(6), 183-1.
[49] Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
[50] X. Wang and X. Tang, “Face Photo-Sketch Synthesis and Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 31, 2009.
[51] S. Yang, P. Luo, C. C. Loy, X. Tang, From Facial Parts Responses to Face Detection: A Deep Learning Approach. IEEE International Conference on Computer Vision (ICCV), 2015.

QR CODE