一個基於專注力的相片構圖改進方法—使用殘差神經網路架構實現

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃偉翔 Wei-Xiang Huang
論文名稱：	一個基於專注力的相片構圖改進方法—使用殘差神經網路架構實現 An Attention-Based Photo Composition Improvement Method —Implementation Using Residual Neural Network Architecture
指導教授：	范欽雄 Chin-Shyurng Fahn
口試委員:	王聖智 Sheng-Jyh Wang 謝君偉 Jun-Wei Hsieh 馮輝文 Huei-Wen Ferng
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	68
中文關鍵詞：	基於注意力的相片裁剪、殘差網路、卷積神經網路、美學評估、照片構圖分類、CNN視覺化
外文關鍵詞：	Attention-based Image Cropping, Residual Network, Convolutional Neural Network, Aesthetic Assessment, Photo Composition Classification, CNN Visualization
相關次數：	點閱：317 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

過去，學習攝影需要很多時間，而且也沒有嚴格的定義來拍攝出好照片，它通常依賴於人類的經驗；由於上述原因，我們希望提出一個自動相片構圖改進方法，以幫助新手在沒有實際指導的情況下學習攝影。
我們專注於調整圖像的構圖，因為我們認為構圖是照片美學中最重要的元素。本文首先將殘差卷積神經網絡(Residual CNN)應用於圖像構圖的分類，在判斷一張照片的構圖後，我們希望改善這張照片的品質；因此，我們發展了一種基於注意力的圖像裁剪方法。「注意力」是指找出圖像中的重要區域或對象，但是定義出圖像的主要對象抑或主題是一項艱鉅的任務，需要具備深厚的領域知識和精湛的技能，本論文係使用CNN視覺化來找出圖像中的重要區域。裁剪圖像的美學評估非常主觀，不同的觀看者可能會有不同的意見，所以，我們將生成一批裁剪候選者並通過我們的深度學習模型進行評估。對於每個候選者，使用分類器或等級評定其視覺抑或美學品質，將得分最高的照片視為最佳裁剪結果。
在照片構圖分類中，我們使用包含7種構圖類型的資料集，其中包括4,361張具有不同解析度的照片，以評估該方法的性能。在含有3,661張照片的資料集，我們獲得平均召回率是99.9％且平均準確率是99.9％，而在700張照片的測試集上，平均召回率為89.1％且平均準確率為89.1％；隨後，我們採用CUHK-ICD資料集，其中包含具有不一樣解析度的950張照片，在根據IoU和BDE的兩個指標之下，評估圖像裁剪的性能。將我們的圖像裁剪方法與EnhanceGAN和view-finding-network（VFN）進行了比較，我們使用基於注意力的模型獲得最好的圖像裁剪表現，其平均IoU = 0.78和平均BDE = 0.059。

In the past, we need to take plenty of time to learn photography. Besides, there are no such rigorous definitions to capture a good photo. It often relies on the human experience. Motivated by the above reasons, we want to develop an automatic improvement method to help a newbie to learn photography without an actual instructor.
We focus on adjusting image composition because we believe the composition is the most important element in photo aesthetics. First, in this thesis, we apply residual convolutional neural networks to classifying image composition. After the composition of a photo is distinguished, we want to improve the quality of this photo. For such a purpose, we propose an attention-based image cropping method to achieve this. “Attention” means finding out important regions or objects in an image. However, determining the main object or theme of a given image is a nontrivial task, which needs deep domain knowledge and sophisticated skills. To accomplish this, we resort to using CNN visualization to find important regions in an image. Aesthetic assessment of a cropped image is highly subjective such that different viewers might have various opinions. Therefore, we generate a batch of cropping candidates and evaluate them by our deep learning model. For each candidate, a classifier or a ranker is employed to evaluate its visual/aesthetic quality, and the one with the highest score is considered as the optimal cropping result.
In photo composition classification, we use our dataset consists of seven types of compositions, which includes 4,361 photos with different resolutions to evaluate the performance of our method. The average recall rate is 99.9% and the average precision rate is 99.9% on the training set of 3,661 photos. And the average recall rate is 89.1% and the average precision rate is 89.1% on the testing set of 700 photos. Subsequently, we adopt the CUHK-ICD dataset of 950 photos with various resolutions to evaluate the performance of image cropping. Our image cropping method is compared with EnhanceGAN and view-finding-network (VFN) using two metrics of IoU and BDE. Our model with attention achieves the best performance whose average IoU=0.787 and average BDE=0.059.

中文摘要    i
Abstract    ii
誌謝    iv
Contents        v
List of Figures    vii
List of Tables    x
Chapter 1    Introduction    1
1    Overview    1
2    Motivation    2
3    System Description    2
4    Thesis Organization    3
Chapter 2    Related Work    4
1.    Image Cropping    4
1.1 Attention-based method    5
1.2 Aesthetics-based method    6
2.    CNN Visualization    7
Chapter 3    Composition Classification Method    10
1 Type of Composition    10
1.1 Central composition    11
1.2 Diagonal composition    11
1.3 Horizontal composition    12
1.4 Vanishing point composition    12
1.5 Rule of thirds composition    13
1.6 Symmetry composition    13
1.7 Vertical composition    14
2 Convolutional Neural Networks    14
2.1 Convolutional layer    15
2.2 Pooling layer    19
2.3 Fully connected network    20
2.4 Residual network    20
2.5 Our model    22
Chapter 4  Composition Improvement Method    24
1 Attention Box Generation    25
1.1 Gradient-weighted class activation mapping    26
1.2 Grad-CAM++    29
1.3 Binary mask    32
2 Cropping Candidates    33
Chapter 5  Experimental Results and Discussions    35
1 Experimental Setup    35
1.1 Evaluation data and metrics    36
2 Results of Composition Classification    39
3 Results of Composition Improvement    46
Chapter 6  Conclusions and Future Work    52
1 Conclusions    52
2 Future Work    54
References    55


                                

[1] B. Cheng et al., “Learning to photograph,” in Proceedings of the 18th Association for Computing Machinery Conference on Multimedia, Firenze, Italy, pp. 291-300, 2010.
[2] D. Li et al., “A2-RL: Aesthetics aware reinforcement learning for image cropping,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, pp. 8193-8201, 2018.
[3] M. Jiang et al., “SALICON: Saliency in context,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts, pp. 1072-1080, 2015.
[4] W. Wang and J. Shen, “Deep cropping via attention box prediction and aesthetics assessment,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 2186-2194, 2017.
[5] R. Datta et al., “Studying aesthetics in photographic images using a computational approach,” in Proceedings of the 9th European Conference on Computer Vision, Berlin, Heidelberg, pp. 288-301, 2006.
[6] N. Murray, F. Marchesotti, and F. Perronnin, “AVA: A large-scale database for aesthetic visual analysis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, Rhode Island, pp. 2408-2415, 2012.
[7] Q. Guan et al., “Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification,” 2018, arXiv:1801.09927.

[8] J. T. Springenberg et al., “Striving for simplicity: The all convolutional net,” 2014, arXiv:1412.6806.
[9] B. Zhou et al., “Learning deep features for discriminative localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp. 2921-2929, 2016.
[10] Y. L. Chen, T. W. Huang, and K. H. Chang, “Quantitative analysis of automatic image cropping algorithms,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Santa Rosa, California, pp. 226-234, 2017.
[11] R. R. Selvaraju et al., “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp. 618-626, 2016.
[12] R. Yamashita et al., “Convolutional neural networks: An overview and application in radiology,” Insights into Imaging, vol. 9, no. 4, pp. 611-629, 2018.
[13] Y. Deng, C. C. Loy, and X. Tang, “Image aesthetic assessment: An experimental survey,” 2016, arXiv:1610.00838.
[14] A. Chattopadhyay et al., “Grad-CAM++: Improved visual explanations for deep convolutional networks,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, Nevada, pp. 839-847, 2018.

[15] Y. Deng, C. C. Loy, and X. Tang, “Aesthetic-driven image enhancement by adversarial learning,” in Proceedings of the 26th Association for Computing Machinery International Conference on Multimedia, Seoul, Republic of Korea, pp. 870-878, 2018.
[16] Y. L. Chen et al., “Learning to compose with professional photographs on the web,” in Proceedings of the 25th Association for Computing Machinery International Conference on Multimedia, Mountain View, California, pp. 37-45, 2017.
[17] P. Lu et al. “An end-to-end neural network for image cropping by learning composition from aesthetic photos,” 2019, arXiv:1907.01432.
[18] T. Hu et al., “See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification,” 2019, arXiv:1901.09891.
[19] Z. Wei et al., “Good view hunting: Learning photo composition from dense view pairs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, pp. 5437-5446, 2018.

全文公開日期 2025/08/18 (校內網路)
全文公開日期 2025/08/18 (校外網路)
全文公開日期 2030/08/18 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文