簡易檢索 / 詳目顯示

研究生: 黃偉翔
Wei-Xiang Huang
論文名稱: 一個基於專注力的相片構圖改進方法—使用殘差神經網路架構實現
An Attention-Based Photo Composition Improvement Method —Implementation Using Residual Neural Network Architecture
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 王聖智
Sheng-Jyh Wang
謝君偉
Jun-Wei Hsieh
馮輝文
Huei-Wen Ferng
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 68
中文關鍵詞: 基於注意力的相片裁剪殘差網路卷積神經網路美學評估照片構圖分類CNN視覺化
外文關鍵詞: Attention-based Image Cropping, Residual Network, Convolutional Neural Network, Aesthetic Assessment, Photo Composition Classification, CNN Visualization
相關次數: 點閱:198下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 過去,學習攝影需要很多時間,而且也沒有嚴格的定義來拍攝出好照片,它通常依賴於人類的經驗;由於上述原因,我們希望提出一個自動相片構圖改進方法,以幫助新手在沒有實際指導的情況下學習攝影。
    我們專注於調整圖像的構圖,因為我們認為構圖是照片美學中最重要的元素。本文首先將殘差卷積神經網絡(Residual CNN)應用於圖像構圖的分類,在判斷一張照片的構圖後,我們希望改善這張照片的品質;因此,我們發展了一種基於注意力的圖像裁剪方法。「注意力」是指找出圖像中的重要區域或對象,但是定義出圖像的主要對象抑或主題是一項艱鉅的任務,需要具備深厚的領域知識和精湛的技能,本論文係使用CNN視覺化來找出圖像中的重要區域。裁剪圖像的美學評估非常主觀,不同的觀看者可能會有不同的意見,所以,我們將生成一批裁剪候選者並通過我們的深度學習模型進行評估。對於每個候選者,使用分類器或等級評定其視覺抑或美學品質,將得分最高的照片視為最佳裁剪結果。
    在照片構圖分類中,我們使用包含7種構圖類型的資料集,其中包括4,361張具有不同解析度的照片,以評估該方法的性能。在含有3,661張照片的資料集,我們獲得平均召回率是99.9%且平均準確率是99.9%,而在700張照片的測試集上,平均召回率為89.1%且平均準確率為89.1%;隨後,我們採用CUHK-ICD資料集,其中包含具有不一樣解析度的950張照片,在根據IoU和BDE的兩個指標之下,評估圖像裁剪的性能。將我們的圖像裁剪方法與EnhanceGAN和view-finding-network(VFN)進行了比較,我們使用基於注意力的模型獲得最好的圖像裁剪表現,其平均IoU = 0.78和平均BDE = 0.059。


    In the past, we need to take plenty of time to learn photography. Besides, there are no such rigorous definitions to capture a good photo. It often relies on the human experience. Motivated by the above reasons, we want to develop an automatic improvement method to help a newbie to learn photography without an actual instructor.
    We focus on adjusting image composition because we believe the composition is the most important element in photo aesthetics. First, in this thesis, we apply residual convolutional neural networks to classifying image composition. After the composition of a photo is distinguished, we want to improve the quality of this photo. For such a purpose, we propose an attention-based image cropping method to achieve this. “Attention” means finding out important regions or objects in an image. However, determining the main object or theme of a given image is a nontrivial task, which needs deep domain knowledge and sophisticated skills. To accomplish this, we resort to using CNN visualization to find important regions in an image. Aesthetic assessment of a cropped image is highly subjective such that different viewers might have various opinions. Therefore, we generate a batch of cropping candidates and evaluate them by our deep learning model. For each candidate, a classifier or a ranker is employed to evaluate its visual/aesthetic quality, and the one with the highest score is considered as the optimal cropping result.
    In photo composition classification, we use our dataset consists of seven types of compositions, which includes 4,361 photos with different resolutions to evaluate the performance of our method. The average recall rate is 99.9% and the average precision rate is 99.9% on the training set of 3,661 photos. And the average recall rate is 89.1% and the average precision rate is 89.1% on the testing set of 700 photos. Subsequently, we adopt the CUHK-ICD dataset of 950 photos with various resolutions to evaluate the performance of image cropping. Our image cropping method is compared with EnhanceGAN and view-finding-network (VFN) using two metrics of IoU and BDE. Our model with attention achieves the best performance whose average IoU=0.787 and average BDE=0.059.

    中文摘要 i Abstract ii 誌謝 iv Contents v List of Figures vii List of Tables x Chapter 1 Introduction 1 1.1 Overview 1 1.2 Motivation 2 1.3 System Description 2 1.4 Thesis Organization 3 Chapter 2 Related Work 4 2.1. Image Cropping 4 2.1.1 Attention-based method 5 2.1.2 Aesthetics-based method 6 2.2. CNN Visualization 7 Chapter 3 Composition Classification Method 10 3.1 Type of Composition 10 3.1.1 Central composition 11 3.1.2 Diagonal composition 11 3.1.3 Horizontal composition 12 3.1.4 Vanishing point composition 12 3.1.5 Rule of thirds composition 13 3.1.6 Symmetry composition 13 3.1.7 Vertical composition 14 3.2 Convolutional Neural Networks 14 3.2.1 Convolutional layer 15 3.2.2 Pooling layer 19 3.2.3 Fully connected network 20 3.2.4 Residual network 20 3.2.5 Our model 22 Chapter 4 Composition Improvement Method 24 4.1 Attention Box Generation 25 4.1.1 Gradient-weighted class activation mapping 26 4.1.2 Grad-CAM++ 29 4.1.3 Binary mask 32 4.2 Cropping Candidates 33 Chapter 5 Experimental Results and Discussions 35 5.1 Experimental Setup 35 5.1.1 Evaluation data and metrics 36 5.2 Results of Composition Classification 39 5.3 Results of Composition Improvement 46 Chapter 6 Conclusions and Future Work 52 6.1 Conclusions 52 6.2 Future Work 54 References 55

    [1] B. Cheng et al., “Learning to photograph,” in Proceedings of the 18th Association for Computing Machinery Conference on Multimedia, Firenze, Italy, pp. 291-300, 2010.
    [2] D. Li et al., “A2-RL: Aesthetics aware reinforcement learning for image cropping,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, pp. 8193-8201, 2018.
    [3] M. Jiang et al., “SALICON: Saliency in context,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts, pp. 1072-1080, 2015.
    [4] W. Wang and J. Shen, “Deep cropping via attention box prediction and aesthetics assessment,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 2186-2194, 2017.
    [5] R. Datta et al., “Studying aesthetics in photographic images using a computational approach,” in Proceedings of the 9th European Conference on Computer Vision, Berlin, Heidelberg, pp. 288-301, 2006.
    [6] N. Murray, F. Marchesotti, and F. Perronnin, “AVA: A large-scale database for aesthetic visual analysis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, Rhode Island, pp. 2408-2415, 2012.
    [7] Q. Guan et al., “Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification,” 2018, arXiv:1801.09927.

    [8] J. T. Springenberg et al., “Striving for simplicity: The all convolutional net,” 2014, arXiv:1412.6806.
    [9] B. Zhou et al., “Learning deep features for discriminative localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp. 2921-2929, 2016.
    [10] Y. L. Chen, T. W. Huang, and K. H. Chang, “Quantitative analysis of automatic image cropping algorithms,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Santa Rosa, California, pp. 226-234, 2017.
    [11] R. R. Selvaraju et al., “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp. 618-626, 2016.
    [12] R. Yamashita et al., “Convolutional neural networks: An overview and application in radiology,” Insights into Imaging, vol. 9, no. 4, pp. 611-629, 2018.
    [13] Y. Deng, C. C. Loy, and X. Tang, “Image aesthetic assessment: An experimental survey,” 2016, arXiv:1610.00838.
    [14] A. Chattopadhyay et al., “Grad-CAM++: Improved visual explanations for deep convolutional networks,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, Nevada, pp. 839-847, 2018.

    [15] Y. Deng, C. C. Loy, and X. Tang, “Aesthetic-driven image enhancement by adversarial learning,” in Proceedings of the 26th Association for Computing Machinery International Conference on Multimedia, Seoul, Republic of Korea, pp. 870-878, 2018.
    [16] Y. L. Chen et al., “Learning to compose with professional photographs on the web,” in Proceedings of the 25th Association for Computing Machinery International Conference on Multimedia, Mountain View, California, pp. 37-45, 2017.
    [17] P. Lu et al. “An end-to-end neural network for image cropping by learning composition from aesthetic photos,” 2019, arXiv:1907.01432.
    [18] T. Hu et al., “See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification,” 2019, arXiv:1901.09891.
    [19] Z. Wei et al., “Good view hunting: Learning photo composition from dense view pairs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, pp. 5437-5446, 2018.

    無法下載圖示 全文公開日期 2025/08/18 (校內網路)
    全文公開日期 2025/08/18 (校外網路)
    全文公開日期 2030/08/18 (國家圖書館:臺灣博碩士論文系統)
    QR CODE