研究生: |
吳孟倫 Meng-Luen Wu |
---|---|
論文名稱: |
應用樹狀分類器與類神經網路於影像構圖與調性風格美學評價的當代專業攝影指引之研究 On Professional Contemporary Style Photographing Instruction Based on Neural Tree Based Classifiers Applied to Image Aesthetics Assessment |
指導教授: |
范欽雄
Chin-Shyurng Fahn |
口試委員: |
陳祝嵩
Chu-Song Chen 施仁忠 Zen-Chung Shih 李同益 Tong-Yee Lee 王榮華 Jung-Hua Wang 馮輝文 Huei-Wen Ferng 謝仁偉 Jen-Wei Hsieh 范欽雄 Chin-Shyurng Fahn |
學位類別: |
博士 Doctor |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 155 |
中文關鍵詞: | 計量審美學 、資料探勘 、決策樹 、隨機森林 、類神經網路 |
外文關鍵詞: | Computational aesthetics, data mining, decision tree, random forest, artificial neural networks |
相關次數: | 點閱:473 下載:31 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文研究如何使用機器學習等人工智慧技術,讓電腦習得人類對於美感的抽象概念,並以此建構出一個攝影指引系統,教導相機使用者拍攝出符合專業攝影師水準之照片。研究中抓取網際網路社群上近年來較受歡迎之專業照片,導入資料探勘演算法,以分析當代之美學標準。本研究對於美感分析,分為兩個方面,其一為影像調性分析,其二為影像構圖分析。影像調性方面,針對影像的色彩、亮度、對比以及材質方面等作分析,判斷是否符合當代美學標準;影像構圖方面,針對影像的結構,判斷是否符合專業攝影師常見的構圖種類。
所提出的攝影指引系統由樹狀分類器以及類神經網路組成,以隨機森林神經網路預測所輸入之影像是否符合當代美學標準,若不符合則分析決策路徑,自動給予少許修正建議,令使用者輕易將所輸入之照片修正為高水準照片。給予建議之決策樹以二元決策樹為主。決策樹相較於神經網路,其決策過程能夠以語意解釋,然而其資料分割為軸對齊,對於資料分類的準確度有所限制,本論文中將神經網路整合於決策樹中,再將決策樹改為隨機決策森林,兼顧神經網路與決策樹之優點,大幅提升其分類準確性。研究中亦探討無法給予指引的限制為何。攝影指引的方式分為影像調性以及影像構圖兩種:在影像調性方面,以語意方式提示受測影像中何種特徵需要增強或減弱;在影像構圖方面,則繪製方塊於輸入影像上給予提示,增強或減弱部分區塊的特徵。
實驗中,我們預測一張影像是否受到社群網路歡迎,若單獨使用本論文所提出的影像調性或影像構圖特徵,可以達到85%以上的準確率,若將兩者結合使用,則可達到91%以上的準確率。使用隨機森林與神經網路結合作為分類器時,所得的準確率最高。此外,所提出的方法亦在影像調性以及影像構圖兩個方面,提出了有效的攝影指引,使影像調性更為和諧、構圖更為平衡,主體更為凸顯。
In this dissertation, we study on how to use artificial intelligence and data mining technologies to make computers able to perceive the concept of beauty, which is an abstract idea, and design a photographing instruction system accordingly. We collect contemporary style images captured in recent years on social networks for analysis. In our instruction system, there are two parts of instruction, one is image characteristics, and the other is image composition. The image characteristics refers to the color and textures, while the image composition refers to the structure of an image.
Our proposed photographing instructor is composed of tree-based classifiers and artificial neural networks, and form a random forest to predict whether an image meets the criterions of the contemporary style. Binary decision tree are built for photographing instruction. However, the decision tree suffers from axis-aligned problem, which limits its accuracy. Therefore, we combine the decision tree and neural network, and use the subsets to build multiple random trees as random forest to improve the accuracy. We also described about the limitations of the instruction system. The system gives semantic sentences to users for image characteristics enhancement, and use blocks to indicate which regions should be improved for image composition.
In the experiments, we predict whether an image is favorable. When using image characteristics and composition features separately, and achieved 85% accuracy. When combining the two types of features, the accuracy was above 91%. In addition, the proposed instruction system is able to give correct suggestions. After applying the suggestions from our proposed system, the colors were more harmonized, the compositions were more balanced, and the main subjects were enhanced.
[1] M. J. Huiskes, T. Bart, and S. L. Michael, “New Trends and Ideas in Visual Concept Detection,” in Proceedings of the International Conference on Multimedia Information Retrieval, Philadelphia, PA, Mar. 2010, pp. 527-536.
[2] R. Datta, L. Jia, J. Z. Wang, “Algorithmic inferencing of aesthetics and emotion in natural images: An exposition,” in Proceedings of International Conference on Image Processing, San Diego, CA, Oct. 2008, pp. 105-108.
[3] D. Joshi, R. Datta, E. Fedorovskaya, Q. T. Luong, J. Z. Wang, J. Li, and J. Luo, “Aesthetics and emotions in images,” IEEE Signal Processing Magazine, vol. 28, no. 5, pp. 94-115, 2011.
[4] R. Datta, L. Jia, J. Z. Wang, “Algorithmic inferencing of aesthetics and emotion in natural images: An exposition,” in Proceedings of International Conference on Image Processing, San Diego, CA, Oct. 2008, pp. 105-108.
[5] B. Zhang, C. Quan, and F. Ren, "Study on CNN in the recognition of emotion in audio and images," in Proceedings of International Conference on Computer and Information Science (ICIS), Okayama, Japan, 2016, pp. 1-5. doi: 10.1109/ICIS.2016.7550778
[6] A.E. Savakis, S. P. Etz, and A. C. Loui, “Evaluation of image appeal in consumer photography,” in Proceedings of Human Vision and Electronic Imaging, San Jose, CA, Jan. 2000, pp. 111-120.
[7] P. Obrador, L. Schmidt-Hackenberg, and N. Oliver, “The role of image composition in image aesthetics,” in Proceedings of International Conference on Image Processing, Hong Kong, China, Sep. 2010, pp. 3185-3188.
[8] R. Datta, D. Joshi, J. Li, and J. Wang, “Studying aesthetics in photographic images using a computational approach,” in Proceedings of European Conference on Computer Vision (ECCV), Graz, Austria, 2006, pp. 288-301.
[9] Y. Ke, X. Tang, and F. Jing, “The Design of High-level Features for Photo Quality Assessment,” in Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), New York, NY, Jun. 2006, pp. 419-426.
[10] K. Y. Lo, K. H. Liu, and C. S. Chen, “Intelligent Photographing Interface with On-device Aesthetic Quality Assessment,” in Proceedings of Asian Conference on Computer Vision, Daejeon, Korea, Nov. 2012, pp. pp. 533-544.
[11] M. J. Huiskes, B. Thomee, and M. S. Lew, "New Trends and Ideas in Visual Concept Detection", in Proceedings of ACM SIGMM International Conference on Multimedia Information Retrieval, Philadelphia, PA, Mar. 2010, pp. 527-536.
[12] C. H. Yeh, Y. C. Ho, and B. A. Barsky, “Personalized photograph ranking and selection system,” in Proceedings of the International Conference on Multimedia, Firenze, Italy, Oct. 2010, pp. 211-220.
[13] K. Park, S. Hong, M. Baek, and B. Han, “Personalized Image Aesthetic Quality Assessment by Joint Regression and Ranking,” in Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, Mar. 2017, pp. 1206-1214.
[14] L. Huang, T. Xia, J. Wan, Y. Zhang, and S. Lin, “Personalized portraits ranking,” in Proceedings of the International Conference on Multimedia, Scottsdale, Arizona, Nov. 2011, pp. 1277-1280.
[15] S. Dhar, O. Vicente, and L. B. Tamara, “High level describable attributes for predicting aesthetics and interestingness,” in Proceedings of Computer Vision and Pattern Recognition (CVPR), Colorado Springs, Jun. 2011, pp. 1657-1664.
[16] M. Cha, M. Alan, and P. G. Krishna, “A measurement-driven analysis of information propagation in the flickr social network,“ in Proceedings of the International Conference on World Wide Web, Madrid, Spain, Apr. 2009, pp. 721-730.
[17] M. Gygli, H. Grabner, H. Riemenschneider, and H. Nater, “The interestingness of images,” in Proceedings of International Conference on Computer Vision, Sydney, Australia, Dec. 2013, pp. 1633-1640.
[18] L. Mai, H. Le, Y. Niu and F. Liu, "Rule of Thirds Detection from Photograph," in Proceedings of International Symposium on Multimedia, Dana Point CA, Dec. 2011, pp. 91-96.
[19] L. Bai, X. Wang and Y. Chen, “Landscape Image Composition Analysis Based on Image Processing,” in Proceedings of International Conference on Computer Science and Automation Engineering, Beijing, China, May. 2012, pp. 787-790.
[20] J. H. Huang, “A Fuzzy Logic Approach for Recognition of Photographic Compositions,” M.S. Thesis, Dept. Math. Sci., National Chengchi Univ., Taipei, Taiwan, 2007.
[21] Y. T. Lin, “A Photo Composition Classification System Based on Supervised Learning,” M.S. Thesis, Dept. Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, 2013.
[22] L. Liu, R. Chen, L. Wolf, and D. Cohen-Or, “Optimizing photo composition,“ Computer Graphics Forum, vol. 29, no. 2, pp. 469-478. doi: 10.1111/j.1467-8659.2009.01616.x
[23] O. Fried, E. Shechtman, D. B. Goldman, and A. Finkelstein, “Finding distractors in images,” in Proceedings of Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun. 2015, pp. 1703-1712.
[24] Z. Bayes, M. Dixon, K. Goodier, C. M. Grimm, and W. D. Smart, “An autonomous robot photographer,” in Proceedings of International Conference on Intelligent Robots and Systems, Las Vegas, NV, Oct. 2003, pp. 2636-2641.
[25] Z. Bayes, M. Dixon, K. Goodier, C. M. Grimm, and W. D. Smart, “An autonomous robot photographer,” AI Magazine, vol. 23, no. 3, pp. 37, 2004.
[26] Ray, Lawrence A., and Henry Nicponski. "Face detecting camera and method." U.S. Patent No. 6,940,545. 6 Sep. 2005.
[27] C. S. Fahn and M. L. Wu, “Automatic photographing method and system thereof,” U. S. Patent 9 106 838, August 11, 2015.
[28] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Internal Representations by Error Propagation,” Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, MIT Press, MA: Cambridge, vol. 1, pp. 318-362, 1986.
[29] A. Krizhevsky, I. Sutskever, G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Advances in neural information processing systems, pp. 1097-1105, 2012.
[30] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, Cambridge, MA: MIT Press, 2016.
[31] X. Lu, Z. Lin, H. Jin, J. Yang, and J. Z. Wang, “Rapid: Rating pictorial aesthetics using deep learning,” In Proceedings of the International conference on Multimedia, Orlando, Florida, Nov. 2014, pp. 457-466.
[32] L. Mai, J. Hailin, and L. Feng, "Composition-preserving deep photo aesthetics assessment." in Proceedings of Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, Jun. 2016, pp. 497-506.
[33] M. L. Wu and C. S. Fahn, “A Decision Tree Based Image Enhancement Instruction System for Producing Contemporary Style Images,” in Proceedings of International Conference on Human-Computer Interaction, Toronto, Canada, Jul. 2016, pp. 80-90.
[34] P. Viola, M. J. Jones, “Robust real-time face detection,” International Journal of Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.
[35] M. Tkalcic and F. T. Jurij, Colour spaces: perceptual, historical and applicational background. New York, NY: IEEE, 2003.
[36] X. Zhang and B. A. Wandell, “A spatial extension of CIELAB for digital color‐image reproduction,” Journal of the Society for Information Display, vol. 5, no. 1, pp. 61-63, 1997.
[37] G. E. Müller, Ueber die Farbenempfindungen: Psychophysische Untersuchungen, Heidelberg, Germany: JA Barth, 1930.
[38] B. J. Calder and A. M. Tybout, “A vision of theory, research, and the future of business schools,” Journal of the Academy of Marketing Science, vol. 27, no. 3, pp. 359-366, 1999.
[39] R. A. Houstoun, "XXXVI. A theory of colour vision," The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol. 38, no. 225, pp. 402-417, 1919.
[40] T. Young, “The Bakerian Lecture: On the Theory of Light and Colours,” Philosophical Transactions of the Loyal Society of London, vol. 92, pp. 12-48, 1802.
[41] H. Helmholtz, Treatise on Physiological Optics, Mineola, NY: Dover publications, 2013.
[42] L. M. Hurvich and D. Jameson, “An opponent-process theory of color vision,” Psychological review, vol. 64, no.6, pp.384, 1957.
[43] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Statistical Laboratory of the University of California, Berkeley, Jun. 1965, pp. 281-297.
[44] R. M. Evans, “Method for correcting photographic color prints,” U. S. Patent 2 571 697, October 16, 1951.
[45] G. Buchsbaum, “A spatial processor model for object colour perception,” Journal of the Franklin institute, vol. 310, no. 1, pp. 1-26, 1980.
[46] E. H. Land, “The retinex theory of color vision,” Scientific American, pp. 108-129, 1997.
[47] A. A. Michelson, Studies in Optics, Chicago, IL: University of Chicago Press, 1927.
[48] R. C. Gonzalez and R. E. Woods, Digital image processing, 3rd Edition, London, England: Pearson, 2008.
[49] A. Polesel, G. Ramponi, and V. J. Mathews, “Image enhancement via adaptive unsharp masking,” IEEE transactions on image processing, vol. 9, no. 3, pp. 505-510, 2000.
[50] I. Sobel and G. Feldman, “A 3x3 Isotropic Gradient Operator for Image Processing,” Presentation for Stanford Artificial Project, 1968.
[51] S. Suzuki, “Topological structural analysis of digitized binary images by border following,” Computer vision, graphics, and image processing, vol. 30, no. 1, pp. 32-46, 1985.
[52] E. Reinard, W. Heidrich, P. Debevbec, S. Pattanaik, G. Ward, and K. Myszkowski, High dynamic range imaging: acquisition, display, and image-based lighting, Burlington, MA: Morgan Kaufmann, 2010.
[53] C. P. Papageorgiou, M. Oren and T. Poggio, "A general framework for object detection," in Proceedings of Sixth International Conference on Computer Vision, Bombay, India, Jan. 1998, pp. 555-562.
[54] A. Graps, “An Introduction to Wavelets,” IEEE Computational Science & Engineering, vol. 2, no. 2, pp. 50-61, Jun. 1995.
[55] Y. Freund and R. E. Schapire, “A Decision-theoretic Generalization of On-line Learning and an Application to Boosting,” Journal of Computer and System Sciences, vol. 55, no. 119, pp. 1-35, 1995.
[56] Q. J. Ross, C4.5: programs for machine learning, CA: ACM, 1993.
[57] M. A. Hall, “Correlation-based feature selection for machine learning,” Ph.D. dissertation, The University of Waikato, Waikato, New Zealand, 1999.
[58] H. T. Kam, "Random decision forests," in Proceedings of the Third International Conference on Document Analysis and Recognition. vol. 1, pp. 278-282, Montreal, Canada, 1995.
[59] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Representations by Back-propagating Errors,” Cognitive modeling, vol. 5, no. 3, pp. 1, 1998.
[60] C. Cortes, and V. Vapnik. “Support-vector networks.” Machine Learning, vol. 20, no.3, pp. 273-297, 1995.
[61] K. Pearson, “LIII. On lines and planes of closest fit to systems of points in space,” Philosophical Magazine Series 6, vol. 2, no. 11, pp. 559-572, 1901. doi: 10.1080/14786440109462720
[62] H. Abdi, L. J. Williams, “Principal Component Analysis,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4, pp. 433-459, 2010. doi: 10.1002/wics.101
[63] M. M. Cheng, G. X. Zhang, N. J. Mitra, X. Huang and S. M. Hu, “Global Contrast Based Salient Region Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, 2011, pp. 409-416.
[64] R. O. Duda and P. E. Hart, “Use of the Hough Transformation to Detect Lines and Curves in Pictures,” Communications of the ACM, vol. 15, no. 1, pp. 11-15, Jan, 1972.
[65] E. Rublee, V. Rabaud, K. Konolige and G. Bradski, “ORB: An Efficient Alternative to SIFT or SURF,” in Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain, pp. 2564-2571, 2011.
[66] I. Sobel and G. Feldman, “A 3x3 Isotropic Gradient Operator for Image Processing,” Presentation for Stanford Artificial Project, 1968.
[67] M. Calonder, V. Lepetit, C. Strecha and P. Fua, “BRIEF: Binary Robust Independent Elementary Features,” in Proceedings of the 11th European Conference on Computer Vision, Heraklion, Crete, Greece, pp. 778-792, 2010.
[68] M. A. Fischler and R. C. Bolles, “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381-395, Jun., 1981.
[69] E. H. Adelson, C. H. Anderson, J. R. Bergen, P. J. Burt, and J. M. Ogden, “Pyramid methods in image processing,” RCA engineer, vol. 29, no. 6, pp. 33-41, 1984.
[70] E. J. Candès, and M. B. Wakin, “An introduction to compressive sampling,” IEEE signal processing magazine, vol. 25, no. 2, pp. 21-30, 2008.
[71] G. Freedman, and R. Fattal, “Image and video upscaling from local self-examples,” ACM Transactions on Graphics vol. 30 no. 2, pp.12, 2011.
[72] O. Nobuyuki, “A Threshold Selection Method from Gray-level Histograms,” Automatica, vol. 11, no. 285-296, pp. 23-27, 1975. doi: 10.1109/TSMC.1979.
4310076
[73] S. Suzuki, “Topological structural analysis of digitized binary images by border following,” Computer vision, graphics, and image processing, vol. 30, no. 1, pp. 32-46, 1985. doi: 10.1016/0734-189X(85)90016-7
[74] R. Balestriero, “Neural Decision Trees,” arXiv:1702.07360v2 [stat.ML], Feb. 2017.
[75] S. M. Liu, “On SVM Decision-Based Human Activity Recognition Techniques for Single Camera Video,” M. S. Thesis, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan (R.O.C.), 2011.
[76] G. P. J. Schmitz, C. Aldrich and F. S. Gouws, "ANN-DT: an algorithm for extraction of decision trees from artificial neural networks," IEEE Transactions on Neural Networks, vol. 10, no. 6, pp. 1392-1401, Nov. 1999.
doi: 10.1109/72.809084
[77] E. J. Fortuny, and D. Martens, "Active Learning-Based Pedagogical Rule Extraction", IEEE Transactions on Neural Networks and Learning Systems, vol. 26, pp. 2664-2677, 2015. doi: 10.1109/TNNLS.2015.2389037
[78] V. Ciesielski, B. Perry, and T. Karen, "Finding image features associated with high aesthetic value by machine learning," in Proceedings of the International Conference on Evolutionary and Biologically Inspired Music and Art, Vienna, Austria, Apr. 2013, pp. 47-58.
[79] Y. Luo and X. Tang, “Photo and video quality evaluation: Focusing on the subject,” in Proceedings of the European Conference on Computer Vision, Marseille, France, 2008, pp. 386-399.
[80] C. H. Yeh, W. S. Ng, B. A. Barsky, and O. Ming, “An aesthetics rule-based ranking system for amateur photos,” in Proceedings of ACM SIGGRAPH ASIA Sketches, pp. 24, New Orleans, Louisiana, 2009.
[81] Y. Xu, J. Ratcliff, J. Scovell, G. Speiginer, and R. Azuma, "Real-time Guidance Camera Interface to Enhance Photo Aesthetic Quality." in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2015, pp. 1183-1186.
[82] S. Ma, Y. Fan, and C. W. Chen. "Finding your spot: A photography suggestion system for placing human in the scene." in Proceedings of International Conference on Image Processing, 2014, pp. 556-560.
[83] L. Marchesotti, M. Naila, and P. Florent, "Discovering beautiful attributes for aesthetic image analysis." International journal of computer vision, vol. 113, no. 3, pp. 246-266, 2015.