Author: |
黃偉翔 Wei-Xiang Huang |
Thesis Title: |
一個基於專注力的相片構圖改進方法—使用殘差神經網路架構實現 An Attention-Based Photo Composition Improvement Method —Implementation Using Residual Neural Network Architecture |
Advisor: |
Chin-Shyurng Fahn |
Committee: |
Sheng-Jyh Wang 謝君偉 Jun-Wei Hsieh 馮輝文 Huei-Wen Ferng |
Degree: |
碩士 Master |
Department: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
Thesis Publication Year: | 2020 |
Graduation Academic Year: | 108 |
Language: | 英文 |
Pages: | 68 |
Keywords (in Chinese): | 基於注意力的相片裁剪 、殘差網路 、卷積神經網路 、美學評估 、照片構圖分類 、CNN視覺化 |
Keywords (in other languages): | Attention-based Image Cropping, Residual Network, Convolutional Neural Network, Aesthetic Assessment, Photo Composition Classification, CNN Visualization |
Reference times: | Clicks: 654 Downloads: 0 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
我們專注於調整圖像的構圖,因為我們認為構圖是照片美學中最重要的元素。本文首先將殘差卷積神經網絡(Residual CNN)應用於圖像構圖的分類,在判斷一張照片的構圖後,我們希望改善這張照片的品質;因此,我們發展了一種基於注意力的圖像裁剪方法。「注意力」是指找出圖像中的重要區域或對象,但是定義出圖像的主要對象抑或主題是一項艱鉅的任務,需要具備深厚的領域知識和精湛的技能,本論文係使用CNN視覺化來找出圖像中的重要區域。裁剪圖像的美學評估非常主觀,不同的觀看者可能會有不同的意見,所以,我們將生成一批裁剪候選者並通過我們的深度學習模型進行評估。對於每個候選者,使用分類器或等級評定其視覺抑或美學品質,將得分最高的照片視為最佳裁剪結果。
在照片構圖分類中,我們使用包含7種構圖類型的資料集,其中包括4,361張具有不同解析度的照片,以評估該方法的性能。在含有3,661張照片的資料集,我們獲得平均召回率是99.9%且平均準確率是99.9%,而在700張照片的測試集上,平均召回率為89.1%且平均準確率為89.1%;隨後,我們採用CUHK-ICD資料集,其中包含具有不一樣解析度的950張照片,在根據IoU和BDE的兩個指標之下,評估圖像裁剪的性能。將我們的圖像裁剪方法與EnhanceGAN和view-finding-network(VFN)進行了比較,我們使用基於注意力的模型獲得最好的圖像裁剪表現,其平均IoU = 0.78和平均BDE = 0.059。
In the past, we need to take plenty of time to learn photography. Besides, there are no such rigorous definitions to capture a good photo. It often relies on the human experience. Motivated by the above reasons, we want to develop an automatic improvement method to help a newbie to learn photography without an actual instructor.
We focus on adjusting image composition because we believe the composition is the most important element in photo aesthetics. First, in this thesis, we apply residual convolutional neural networks to classifying image composition. After the composition of a photo is distinguished, we want to improve the quality of this photo. For such a purpose, we propose an attention-based image cropping method to achieve this. “Attention” means finding out important regions or objects in an image. However, determining the main object or theme of a given image is a nontrivial task, which needs deep domain knowledge and sophisticated skills. To accomplish this, we resort to using CNN visualization to find important regions in an image. Aesthetic assessment of a cropped image is highly subjective such that different viewers might have various opinions. Therefore, we generate a batch of cropping candidates and evaluate them by our deep learning model. For each candidate, a classifier or a ranker is employed to evaluate its visual/aesthetic quality, and the one with the highest score is considered as the optimal cropping result.
In photo composition classification, we use our dataset consists of seven types of compositions, which includes 4,361 photos with different resolutions to evaluate the performance of our method. The average recall rate is 99.9% and the average precision rate is 99.9% on the training set of 3,661 photos. And the average recall rate is 89.1% and the average precision rate is 89.1% on the testing set of 700 photos. Subsequently, we adopt the CUHK-ICD dataset of 950 photos with various resolutions to evaluate the performance of image cropping. Our image cropping method is compared with EnhanceGAN and view-finding-network (VFN) using two metrics of IoU and BDE. Our model with attention achieves the best performance whose average IoU=0.787 and average BDE=0.059.
[1] B. Cheng et al., “Learning to photograph,” in Proceedings of the 18th Association for Computing Machinery Conference on Multimedia, Firenze, Italy, pp. 291-300, 2010.
[2] D. Li et al., “A2-RL: Aesthetics aware reinforcement learning for image cropping,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, pp. 8193-8201, 2018.
[3] M. Jiang et al., “SALICON: Saliency in context,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts, pp. 1072-1080, 2015.
[4] W. Wang and J. Shen, “Deep cropping via attention box prediction and aesthetics assessment,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 2186-2194, 2017.
[5] R. Datta et al., “Studying aesthetics in photographic images using a computational approach,” in Proceedings of the 9th European Conference on Computer Vision, Berlin, Heidelberg, pp. 288-301, 2006.
[6] N. Murray, F. Marchesotti, and F. Perronnin, “AVA: A large-scale database for aesthetic visual analysis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, Rhode Island, pp. 2408-2415, 2012.
[7] Q. Guan et al., “Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification,” 2018, arXiv:1801.09927.
[8] J. T. Springenberg et al., “Striving for simplicity: The all convolutional net,” 2014, arXiv:1412.6806.
[9] B. Zhou et al., “Learning deep features for discriminative localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp. 2921-2929, 2016.
[10] Y. L. Chen, T. W. Huang, and K. H. Chang, “Quantitative analysis of automatic image cropping algorithms,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Santa Rosa, California, pp. 226-234, 2017.
[11] R. R. Selvaraju et al., “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp. 618-626, 2016.
[12] R. Yamashita et al., “Convolutional neural networks: An overview and application in radiology,” Insights into Imaging, vol. 9, no. 4, pp. 611-629, 2018.
[13] Y. Deng, C. C. Loy, and X. Tang, “Image aesthetic assessment: An experimental survey,” 2016, arXiv:1610.00838.
[14] A. Chattopadhyay et al., “Grad-CAM++: Improved visual explanations for deep convolutional networks,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, Nevada, pp. 839-847, 2018.
[15] Y. Deng, C. C. Loy, and X. Tang, “Aesthetic-driven image enhancement by adversarial learning,” in Proceedings of the 26th Association for Computing Machinery International Conference on Multimedia, Seoul, Republic of Korea, pp. 870-878, 2018.
[16] Y. L. Chen et al., “Learning to compose with professional photographs on the web,” in Proceedings of the 25th Association for Computing Machinery International Conference on Multimedia, Mountain View, California, pp. 37-45, 2017.
[17] P. Lu et al. “An end-to-end neural network for image cropping by learning composition from aesthetic photos,” 2019, arXiv:1907.01432.
[18] T. Hu et al., “See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification,” 2019, arXiv:1901.09891.
[19] Z. Wei et al., “Good view hunting: Learning photo composition from dense view pairs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, pp. 5437-5446, 2018.