簡易檢索 / 詳目顯示

研究生: 林士傑
Shih-Chieh Lin
論文名稱: 應用於視覺技術之高效能混合式影像特徵擷取
Highly Efficient Hybrid Image Feature Capture for Visual Technology
指導教授: 郭景明
Jing-Ming Guo
口試委員: 劉雲夫
Yun-Fu Liu
夏至賢
Chih-Hsien Hsia
王乃堅
Nai-Jian Wang
宋啟嘉
Chi-Chia Sun
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 190
中文關鍵詞: 深度學習物件偵測半色調半色調分類
外文關鍵詞: Deep Learning, Object Detection, Halftone, Halftone Classification
相關次數: 點閱:262下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

本論文針對於兩項視覺研究進行改善:影片物件偵測與半色調演算法。經過詳細的文獻探討與分析,我們針對這兩項研究之過往技術進行影像處理演算法的改良與提升。在與近期前人文獻的數據與效能比較,皆有較好的表現。
在影片物件偵測中,本論文提出了一個新穎的架構,其能夠針對現有的任何物件偵測技術進行改善。相較於傳統物件偵測的方法,雖然深度學習在處理速度或者是準確度都有著突破性的發展,但是在實際的應用上仍還有進步的空間。例如現在最新的物件偵測研究,Mask R-CNN,達到有史以來最高準確度,然而5fps的處理速度卻難以在影片中及時的處理。不同於以往每個幀分開處理的方式,本論文所提出的方法將空間域的資料轉換到時間域上,隨後透過處理時間域的資料同時預測物件在整部影片上的結果。這個方法的準確度會隨著使用的物件偵測技術而改變,因此針對一些高準確度卻低處理速度的物件偵測技術,能夠最有效的優化,且更能夠實際應用在影片的物件偵測上。
另外,在半色調研究的部分,我們又細分為兩項主題:多模式半色調與半色調分類。在多模式半色調研究中,我們針對影像中不同區域進行不同性質的半色調處理,已獲得最佳的視覺效果。在實驗結果中可以看出,本論文的方法移除了所有先前研究所遭遇到的任何瑕疵問題,使得半色調的影像品質能夠不受瑕疵問題所干擾;在半色調分類研究中,我們同時考慮了15種半色調技術,其中更包含著多色調的技術,結合深度學習的技術可以得到極高的準確度。也比起以往的研究達到更高的效能。


In video object detection, the paper proposes a novel framework for object identification which can perform with high processing rate and sensible accuracy. Although deep learning techniques can analyses videos precisely with good processing speed, there is still a lot of scope for improvements especially in video processing applications. For example, the latest object detection research, Mask R-CNN is known for its optimal performance but its processing speed is limited to only 5fps, and cannot be deployed in real time video processing. The proposed method transforms the data from the spatial domain into the temporal domain, and then predicts the result on the all frame by processing the time domain data. The result shows that the proposed method can outperform the existing methods in terms of processing time with fair accuracy and it can be a reliable and feasible method in various video processing applications.
In Halftoning domain, the research work comprise of two areas such as multi-mode halftone and classification technique. In the multi-mode halftone study, various halftone strategies are adaptively chosen to achieve best visual effect. In the experimental results, it is evident that the proposed method removes any defective problems mentioned in the previous studies and can improvise the quality of the halftone image significantly. In the halftone classification study around 15 halftone techniques are considered including the latest multitone technology. Combined with the deep learning technology, the presented technique can perform rapidly and achieve a very high accuracy than the existing techniques.

中文摘要I AbstractII 致謝III 目錄IV 圖表索引VII 第一章緒論1 1.1影片物件偵測技術1 1.1.1背景介紹1 1.1.2研究動機與目的2 1.2半色調技術5 1.2.1背景介紹5 1.2.2研究動機與目的6 1.3論文架構7 第二章基於卷積神經網路物件偵測技術之文獻探討8 2.1類神經網路的運作9 2.1.1向前傳播(ForwardPropagation)9 2.1.2反向傳播(BackwardPropagation)12 2.2影響神經網路效能的因素16 2.3卷積神經網路20 2.3.1卷積22 2.3.2非線性激勵函數24 2.3.3匯集26 2.3.4訓練方法27 2.3.5視覺化過程30 2.4物件偵測(ObjectDetection)32 2.4.1RegionConvolutionNeuralNetwork(R-CNN)[4]33 2.4.2FastR-CNN[5]37 2.4.3FasterR-CNN[6]40 2.4.4SingleShotMultiboxDetector(SSD)[10]44 2.4.5MaskR-CNN[7] 46 2.5語義分割(SemanticSegmentation)48 2.5.1FullyConvolutionNetwork(FCN)[8]49 2.5.2DeepLab[9]51 第三章半色調技術之文獻探討54 3.1誤差擴散法(Error-diffused,ED)57 3.2有序抖動法(Ordereddither,OD)67 3.3點擴散法(Dot-diffused,DD)71 3.4直接二元搜尋法(Directbinarysearch,DBS)83 3.5雙指標直接二元搜尋法(DualmetricDBS,DMDBS)88 第四章影片物件偵測技術改良91 4.1基於影像物件偵測的影片物件串92 4.1.1錨點候選區域網路(AnchorPointProposalNetwork,APPN)92 4.1.2物件軌跡剖面圖(ObjectTrackProfile)97 4.2影片物件偵測的加速架構101 4.2.1物件軌跡剖面圖的補償(ObjectTrackProfileCompensation)102 4.2.2候選區域軌跡網路架構109 4.3.3損失函數(LossFunction)以及邊框的產生114 4.3實驗結果117 4.3.1資料庫117 4.3.2實現細節124 4.3.3比較結果127 第五章半色調技術改良140 5.1基於隨機點聚式之多模式半色調技術140 5.1.1內部迭代聚集式直接二元搜尋(Inter-IterativeClustered-dotDirectBinarySearch)140 5.1.2網點設計(ScreenDesign)143 5.1.3多模式半色調(Multi-ModeHalftoning)147 5.1.4實驗結果148 5.2半色調及多色調之影像分類160 5.2.1分類機制160 5.2.2有效區域(EffectivePatches)162 5.2.3卷積神經網路165 5.2.4實驗結果165 第六章結論與未來展望169 6.1影片物件偵測169 6.2基於隨機點聚式之多模式半色調技術169 6.3半色調及多色調之影像分類170 參考文獻171

[1]Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
[2]Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[3]Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9).
[4]Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
[5]Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 675-678). ACM.
[6]Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detectionwith region proposal networks. In Advances in neural information processing systems (pp. 91-99).
[7]He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. arXiv preprint arXiv:1703.06870.
[8]Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3431-3440).
[9]Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915.
[10]Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, October). Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.
[11]Redmon, J., & Farhadi, A. (2016). YOLO9000: better, faster, stronger. arXiv preprint arXiv:1612.08242.
[12]Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807-814).
[13]Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. (2010, June). Deconvolutional networks. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 2528-2535). IEEE.
[14]Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., ... & Ghemawat, S. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
[15]Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 675-678). ACM.
[16]Ng, P. C., & Henikoff, S. (2003). SIFT: Predicting amino acid changes that affect protein function. Nucleic acids research, 31(13), 3812-3814.
[17]Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 886-893). IEEE.
[18]Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International journal of computer vision, 104(2), 154-171.
[19]Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 779-788).
[20]Lin, M., Chen, Q., & Yan, S. (2013). Network in network. arXiv preprint arXiv:1312.4400.
[21]Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (pp. 448-456).
[22]Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 248-255). IEEE.
[23]Y. F. Liu and J. M. Guo, “Clustered-dot screen design for digital multitoning, ”IEEE Trans. Image Processing, vol. 25, no. 7, pp 2971-2982, 2016.
[24]P. Goyal, M. Gupta, C. Staelin, M. Fischer, O. Shacham, and J. P. Allebach “Clustered-dot halftoning with direct binary search, ”IEEE Trans. Image Processing, vol. 22, no. 2, pp. 483-487, 2013.
[25]Database of Uncompressed Colour Image. Available website: http://homepages.lboro.ac.uk/~cogs/datasets/ucid/ucid.html
[26]S. J. Park, M. Q. Shaw, G. Kerby, T. Nelson, D.-Y. Tzeng, K. R. Bengtson, and J. P. Allebach, “Halftone blending between smooth and detail screens to improve print quality with electrophotographic printers,” IEEE Trans. Image Processing, vol. 25, no. 2, pp. 601-614, Feb. 2016.
[27]Q. Lin, “Adaptive halftoning based on image content,” U.S Patent 5,970,178, assigned to Hewlett-Packard Development Company, 1999.
[28]K. Kritayakirana, D. Tretter, and Q. Lin, “Adaptive halftoning method and apparatus,” U.S Patent 6,760,126, assigned to Hewlett-Packard Development Company.
[29]Y. F. Liu, J. M. Guo, and J. D. Lee, “Halftone Image Classification Using LMS Algorithm and Naive Bayes,” IEEE Trans. Image Processing, vol. 20, no. 10, pp 2837-2847, 2011.
[30]Keras: https://keras.io/
[31]R. Ulichney, Digital Halftoning. Cambridge, MA: MIT Press, 1987.
[32]DHALF: http://www.efg2.com/Lab/Library/ImageProcessing/DHALF.TXT
[33]R. W. Floyd and L. Steinberg, “An adaptive algorithm for spatial gray scale,” in Proc. SID 75 Digest. Soc. Inf. Display, 1975, pp. 36–37.
[34]J. F. Jarvis, C. N. Judice, and W. H. Ninke, “A survey of techniques for the display of continuous-tone pictures on bilevel displays,” Comput. Graph. Image Proc., vol. 5, no. 1, pp. 13–40, 1976.
[35]R.A. Ulichney, “Dithering with blue noise,” Proc. IEEE, vol. 76, no. 1, pp 56-79, 1988.
[36]D. E. Knuth, “Digital halftones by dot diffusion,” ACM Trans. Graph., vol. 6, no. 4, pp. 245–273, 1987.
[37]M. Mese and P. P. Vaidyanathan, “Optimized halftoning using dot diffusion and methods for inverse halftoning,” IEEE Trans. Image Process., vol. 9, no. 4, pp. 691–709, Apr. 2000.
[38]M. Analoui and J. P. Allebach, “Model based halftoning using direct binary search,” in Proc. SPIE, Human Vision, Visual Proc., Digital Display III, San Jose, CA, Feb. 1992, vol. 1666, pp. 96–108.
[39]K. Chandu, M. Stanich, C. W. Wu, and B. Trager, “Direct multi-bit search (DMS) screen algorithm,” Image Processing (ICIP) 2012 IEEE international Conference on, pp. 817-820, 2012.
[40]P. Stucki, “MECCA-A multiple-error correcting computation algorithm for bilevel image hardcopy reproduction,” Res. Rep. RZ1060, IBM Res. Lab., Zurich, Switzerland, 1981.
[41]J. N. Shiau and Z. Fan, “A set of easily implementable coefficients in error diffusion with reduced worm artifacts,” SPIE, 2658: 222-225, 1996.
[42]P. Li and J. P. Allebach, “Block interlaced pinwheel error diffusion,” JEI, 14(2), Apr-Jun. 2005.
[43]V. Ostromoukhov, “A simple and efficient error diffusion algorithm,” Computer Graphics (Proceedings of SIGGRAPH 2001), pp. 567-572, 2001.
[44]J. M. Guo and Y. F. Liu, “Improved dot diffusion by diffused matrix and class matrix co-optimization,” IEEE Trans. Image Processing, 18(8), pp. 1804-1816, Aug. 2009.
[45]J. P. Allebach, “FM screen design using DBS algorithm,” in Proc. IEEE ICIP, vol. 1, Lausanne, Switzerland, pp. 549-552, 1996.
[46]D. J. Lieberman and J. P. Allebach, “Efficient model based halftoning using direct binary search,” in Proc. IEEE ICIP, pp. 775-778, 1997.
[47]D. J. Lieberman and J. P. Allebach, “A dual interpretation for direct binary search and its implications for tone reproduction and texture quality,” IEEE Trans. Image Processing, 9(11), pp. 1950-1963, 2000.
[48]S. H. Kim and J. P. Allebach, “Impact of HVS models on model-based halftoning,” IEEE Trans. Image Processing, 11(3), pp. 258-269, 2002.
[49]R. Näsänen, “Visibility of halftone dot textures,” IEEE Trans. Syst.,Man, Cybern., vol. 14, no. 6, pp. 920–924, 1984.

QR CODE