簡易檢索 / 詳目顯示

研究生: 廖曼伶
Man-Ling Liao
論文名稱: 應用人臉特徵與相似度於AI生成圖片之檢測
Applying Facial Features and Similarity Analysis in the Detection of AI-Generated Images
指導教授: 陳俊良
Jiann-Liang Chen
口試委員: 郭斯彥
Sy-Yen Kuo
陳英一
Ing-Yi Chen
胡誌麟
Chih-Lin Hu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 52
中文關鍵詞: Stable Diffusion人臉相似度人臉特徵假消息
外文關鍵詞: Stable Diffusion, Face Similarity, Face Features, Fake News
相關次數: 點閱:176下載:16
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著AI技術的蓬勃發展,現代社會對於AI帶來的便利性和依賴度越來越高。在這數位化的時代裡,AI為我們的日常生活及經濟帶來了無數的優點與便利。2022年下半年之際,新的AI模型的出現衝擊了我們的認知。其中,Stable Diffusion模型進一步挑戰了一般大眾對於AI的理解。過去我們認為AI與人類的最大區別在於AI無法創造出新的事物,然而Stable Diffusion卻打破了這個深根固蒂的觀念。
    Stable Diffusion 是一種基於 Latent Diffusion Model 的文字到圖片(text-to-image)生成模型,使用者可以透過文字描述就能生出對應的圖片。其開源的特性,使得Stable Diffusion 成為主流的AI生成圖片模型。
    Stable Diffusion的問世使得普羅大眾也能生成出人眼都難以分辨真偽的真人圖片,這種情況讓我們必須更嚴肅地面對是否會有有心人士造成假新聞和惡意訊息的傳播問題。為了防止這類圖片混淆視聽引發詐騙事件的情形,開發出創新的檢測系統是目前較為急迫的問題。
    在觀察多張AI生成之真人圖片後,發現若是生成圖片中的人數大於一人時,其臉部會有一定的相似程度,在不考慮雙胞胎以及真人與AI生成的真人圖片合照的情況下,本研究以人臉相似度和人臉特徵的方式檢測是否由AI生成之真人圖片,主要分為三個階段:首先圖片會經由人臉提取的模型以提取出人臉特徵,第二階段,若檢測之圖片大於一人則會交由人臉相似度模型檢測其相似度並給予判斷是否為AI生成之圖片,若圖片在第二階段被判斷為普通的真人圖片或是圖片中的人數為一人時則會交由第三階段之檢測模型來判定其是否為AI生成之真人圖片,以達到防止假消息的傳播和詐騙事件的發生。
    本研究的模型於驗證資料集中可達到 96.4% 的準確率。由數據結果可得知,本研究提出之AI真人圖片檢測模型對於檢測AI生成之真人圖片上有一定的能力。


    With the flourishing development of AI technology, modern society's reliance on and appreciation for the conveniences brought by AI is increasing. In this digital age, AI has brought countless advantages and conveniences to our daily lives and economic activities. In the second half of 2022, our understanding of AI significantly changed. The advent of the Stable Diffusion model further challenged the public's perception of AI. We used to believe that the most significant difference between AI and humans was that AI could not create new things. However, the emergence of Stable Diffusion shattered this deeply entrenched belief.
    Stable Diffusion is a text-to-image generation model based on the Latent Diffusion Model, which allows users to generate corresponding images through text descriptions. Its open-source nature has made Stable Diffusion a mainstream model for AI image generation.
    The emergence of Stable Diffusion allows the general public to generate lifelike images almost indistinguishable from the human eye. This situation forces us to seriously confront the potential issues of bad actors spreading fake news and malicious information. The development of an innovative detection system is currently an urgent matter to prevent these types of images from causing confusion or even possibly leading to fraud incidents.
    Upon observing multiple AI-generated images of people, it was found that when the number of people in the generated image is more than one, their faces have a certain degree of similarity. Disregarding situations of twins and photographs combining real people with AI-generated individuals, this study employs facial similarity and facial features to determine whether an image of a person is generated by AI. The detection is divided into three stages: First, the image is processed by a facial extraction model to extract facial features. In the second stage, if the image under inspection contains more than one person, it is handed over to a facial similarity model to evaluate its similarity and determine whether it is an AI-generated image. If the image is judged as a normal human image in the second stage, or if the number of people in the image is one, it will be submitted to a detection model in the third stage to determine whether it is an AI-generated image. This is done to prevent the spread of false information and the occurrence of fraud.
    The model proposed in this study achieves an accuracy rate of 96.4% in the validation dataset. The data results indicate that the AI human image detection model presented in this research can detect AI-generated human images.

    摘要 I Abstract II Acknowledgement IV List of Figures VII List of Table VIII Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 4 1.3 Organization 6 Chapter 2 Related Work 8 2.1 Stable Diffusion 8 2.2 Facial Detection & Extraction Techniques 11 2.3 Face Similarity 14 2.4 EfficientNet Model[26] 14 Chapter 3 Proposed System 17 3.1 System Architecture 17 3.2 Face Detection Model & Dataset 18 3.2.1 RetinaFace Model 18 3.2.2 Dataset 19 3.3 Similarity Model 22 3.3.1 DeepFace Model 22 3.4 AI Image Detection Model 24 3.4.1 EfficientNet Model 24 Chapter 4 Performance Analysis 26 4.1 System Environment and Parameter Settings 26 4.2 Performance Evaluation Metrics 28 4.3 Performance Analysis 30 4.3.1 Analysis of Face Similarity Results 30 4.3.2 Analysis of EfficienNet Model Results 31 4.4 Summary 35 Chapter 5 Conclusions and Future Works 36 5.1 Conclusions 36 5.2 Future Works 37 References 40

    [1] J. Devlin and M. -W. Chang, “Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing,” https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html (Accessed Feb. 7, 2023).
    [2] S. Albawi, T. A. Mohammed and S. Al-Zawi, “Understanding of a convolutional neural network,” Proceedings of the 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 2017, pp. 1-6.
    [3] BBC News, “ Artificial intelligence: Google's AlphaGo beats Go master Lee Se-dol,” https://www.bbc.com/news/technology-35785875 (Accessed Feb. 7, 2023).
    [4] OpenAI, “OpenAI Five defeats Dota 2 world champions,” https://openai.com/research/openai-five-defeats-dota-2-world-champions (Accessed Feb. 8, 2023).
    [5] DeepMind, “AlphaStar: Mastering the real-time strategy game StarCraft II,” https://www.deepmind.com/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii (Accessed Feb. 15, 2023).
    [6] Time, “How This Poker-Playing Computer Beat the Best Human Players,” https://time.com/4656011/artificial-intelligence-ai-poker-tournament-libratus-cmu/ (Accessed Mar. 3, 2023).
    [7] DeepMind, “Competitive programming with AlphaCode,” https://www.deepmind.com/blog/competitive-programming-with-alphacode (Accessed Mar. 7, 2023).
    [8] IBM, “IBM Global AI Adoption Index 2022,” https://www.ibm.com/watson/resources/ai-adoption (Accessed Mar. 8, 2023).
    [9] OpenAI, “Introducing ChatGPT, “ https://openai.com/blog/chatgpt (Accessed Mar. 17, 2023).
    [10] OpenAI, “DALL·E 2 pre-training mitigations,” https://openai.com/research/dall-e-2-pre-training-mitigations (Accessed Apr. 2, 2023).
    [11] Stability AI “Stable Diffusion Public Release,” https://stability.ai/blog/stable-diffusion-public-release (Accessed Apr. 2, 2023).
    [12] Midjourney, https://www.midjourney.com/home/?callbackUrl=%2Fapp%2F (Accessed Apr. 2, 2023).
    [13] NovelAI - The AI Storyteller, https://novelai.net/ (Accessed Apr. 2, 2023).
    [14] StabilityAI, “Stable Diffusion 2-1 - a Hugging Face Space,” https://huggingface.co/spaces/stabilityai/stable-diffusion (Accessed Apr. 10, 2023).
    [15] D. HarIll, “He used AI art from Midjourney to win a fine-arts prize. Did he cheat?,” https://www.washingtonpost.com/technology/2022/09/02/midjourney-artificial-intelligence-state-fair-colorado/ (Accessed Apr. 10, 2023).
    [16] V. Babbs, “Digital Artists Are Pushing Back Against AI,” https://hyperallergic.com/806026/digital-artists-are-pushing-back-against-ai/ (Accessed Apr. 10, 2023).
    [17] 華視新聞網, “AI智慧淪詐騙工具? 中國政府出手「強力監管」,”https://reurl.cc/XL3L9R (Accessed Apr. 10, 2023).
    [18] CompVis/stable-diffusion, https://github.com/CompVis/stable-diffusion (Accessed Apr. 12, 2023).
    [19] R. Rombach, A. Blattmann, D. Lorenz, P. Esser and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models,” Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 10674-10685, doi: 10.1109/CVPR52688.2022.01042.
    [20] J. Alammar, “The Illustrated Stable Diffusion” https://jalammar.github.io/illustrated-stable-diffusion/ (Accessed May 5, 2023).
    [21] OpenAI, “CLIP: Connecting text and images,” https://openai.com/research/clip (Accessed May 5, 2023).
    [22] O. Ronneberger, P. Fischer and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” ArXiv abs/1505.04597 (2015): n. pag.
    [23] Huggingface, “Schedulers,” https://huggingface.co/docs/diffusers/using-diffusers/schedulers (Accessed May 16, 2023).
    [24] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks,” IEEE Signal Processing Letters (SPL), vol. 23, no. 10, pp. 1499-1503, 2016
    [25] H. Jiang and E. Learned-Miller, “Face Detection with the Faster R-CNN,” Proceedings of the 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 2017, pp. 650-657.
    [26] M. Tan and Q. V. Le, “ EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” ArXiv, abs/1905.11946.
    [27] J. Li, Y. Wang, C. Wang, Y. Tai, J.Qian, J. Yang, C. Wang, J. Li and F. Huang “DSFD: Dual Shot Face Detector,” Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 5055-5064.
    [28] J. Deng ,J. Guo, Y. Zhou, J. Yu, I. Kotsia and S. Zafeiriou, “RetinaFace: Single-stage Dense Face Localisation in the Wild,” ArXiv, abs/1905.00641.
    [29] Y. Choi, T. Kim and S. Choi “Automatic detection for javascript obfuscation attacks in Ib pages through string pattern analysis.” International Journal of Security and Its Applications, vol. 4, no. 2, pp. 160-172.
    [30] Kaggle, “Synthetic Faces High Quality (SFHQ) part 4,” https://www.kaggle.com/datasets/selfishgene/synthetic-faces-high-quality-sfhq-part-4 (Accessed Jun. 20, 2023).
    [31] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen and T. Aila “Analyzing and Improving the Image Quality of StyleGAN,” arXiv, abs/1912.04958.
    [32] Kaggle, “Flickr-Faces-HQ Dataset (FFHQ),” https://www.kaggle.com/datasets/arnaud58/flickrfaceshq-dataset-ffhq?resource=download (Accessed Jun. 20, 2023).
    [33] IMDB, “IMDB-WIKI – 500k+ face images with age and gender labels,”https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/ (Accessed Jun. 20, 2023).
    [34] Kaggle, “Pins Face Recognition,”https://www.kaggle.com/datasets/hereisburak/pins-face-recognition (Accessed Jun. 20, 2023).
    [35] Kaggle, “LFW - People (Face Recognition),” https://www.kaggle.com/datasets/atulanandjha/lfwpeople (Accessed Jun. 20, 2023).
    [36] Kaggle, “Biggest gender/face recognition dataset.,”https://www.kaggle.com/datasets/maciejgronczynski/biggest-genderface-recognition-dataset (Accessed Jun. 20, 2023).
    [37] Kaggle, “ Spotting Diffusion Main Dataset.,” https://www.kaggle.com/datasets/sahalmulki/stable-diffusion-generated-images (Accessed Jun. 20, 2023).
    [38] E. J. Hu, Y. Shen, P.Wallis, Z. A.-Zhu, Y. Li, S. Wang, L. Wang and W. Chen Hu, “LoRA: Low-Rank Adaptation of Large Language Models,” ArXiv, abs/2106.09685 (Accessed Jun. 20, 2023).

    QR CODE