簡易檢索 / 詳目顯示

研究生: 洪賀順
Ho-shun Hung
論文名稱: 利用遷移式學習技術應用於組內差異性高的資料集分類方法之研究
Classification with High Intra-Class Variation: A Transfer Learning Approach
指導教授: 鮑興國
Hsing-Kuo Pao
口試委員: 李育杰
Yuh-Jye Lee
劉庭祿
Tyng-Luh Liu
楊傳凱
Chuan-Kai Yang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2012
畢業學年度: 100
語文別: 英文
論文頁數: 42
中文關鍵詞: 遷移式學習TrAdaBoost組內差異性高資料集
外文關鍵詞: transfer learning, TrAdaBoost, high intra-class variation data
相關次數: 點閱:169下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

我們提出了一個利用遷移式學習技術的二元分類方法處理組內差異性高的資料集,我們都知道對組內差異性高的資料集如果收集到的數量不多,通常分類結果不會太好,原因是資料集內部可能又被細分為很多類,這種情況下,每一類收集到的數量變的更少,對於傳統應用資料實例的分類方法如Boosting、AdaBoost,訓練過程中容易得到錯誤率高於二分之一的弱分類器,使得訓練過程中止得到沒有用的結果,我們提出了一個利用遷移式學習的技術來有效率的整合在組內差異性高的資料集中有用的資訊,並進而得到良好的分類效果,在我們的方法中,我們利用類似TrAdaBoost的概念,把資料妥善的分配在目標域和來源域,最後我們利用遷移式的概念選擇出在來源域有用的資訊和目標域的資訊做結合,這樣一來我們就能收集到相對比較豐富的資料,和TrAdaBoost不同的是,在來源域我們不再只是一味地降低資料的權重,這種方式可以使我們收集到更多有用的資料,我們提出的方法主要的貢獻有兩個地方,一方面我們能成功地處裡組內差異性高的資料集所面臨的分類問題,另一方面當TrAdaBoost的目標域和來源域由組內差異性高的資料集所組成時我們能提高它的效能,實驗結果指出我們提出的方法針對這種類型的資料集其分類結果出來的正確率高於一般的分類方法,而且對於TrAdaBoost來說也有比較好的結果。


We proposed a method to deal with the classification of high intra-
class variation data based on a transfer learning approach. The high
intra-class variation is difficult to model especially when we only have
a limited dataset. In this case, a single concept may consist of several
diverse sub-concepts and each concept has only very few samples. The
boosting or Adaboost, for instance, can not help much in this case be-
cause we may easily produce a weak classifier that gives error rate higher
than one half and as a result, the boosting procedure will halt. We pro-
pose a transfer learning approach to effectively integrate the information
from high-variation samples for a successful modeling. In our approach,
we put samples of high variation into the source and target domains, as in
the design of TrAdaboost; then gradually, we select some useful data from
the source domain and combine them with the data in the target domain
to form a rich set for training. What is different from the TrAdaboost
is that in our approach, the weight of data in the source domain is not
necessarily decreased as always; therefore, we can collect more useful data
from the source domain based on the proposed method than based on the
typical TrAdaboost. Our contribution is twofold: on one hand, we can
successfully deal with high intra-class variation data; on the other hand,
we can also improve the performance of TrAdaboost, when the data in the
source and target domains are with high variation. The experiment result
shows that the proposed method can achieve higher accuracy than that
of other classification method such as Adaboost for the classification ofhigh intra-class variation data; moreover, the proposed method performs
better than that of TrAdaboost for the same types of data.

1 Introduction 1 1.1 Problem Proposed . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Framework . . . . . . . . . . . . . . . . . . . . . 3 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Background 6 2.1 AdaBoost Algorithm . . . . . . . . . . . . . . . . . . . . . 8 2.2 Transfer Adaboost (TrAdaBoost) Algorithm . . . . . . . . 9 3 Methodology 12 3.1 High Intra-Class Variation Data . . . . . . . . . . . . . . . 12 3.2 Classification with High Intra-Class Variation Data . . . . 13 3.3 Drawback of the TrAdaBoost . . . . . . . . . . . . . . . . 14 3.4 Data Selecion . . . . . . . . . . . . . . . . . . . . . . . . . 15 I4 Experiment Results 20 4.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 21 4.1.1 Manifold learning . . . . . . . . . . . . . . . . . . . 21 4.1.2 ISOMAP . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Experiment : Two-Dimensional Synthetic Data . . . . . . 22 4.3 Experiment : Curved or Linear ? . . . . . . . . . . . . . . 27 4.3.1 Detection of the Curved and Linear Alphabet . . . 27 4.3.2 Detection of the Curved and Linear Digital Numbers 29 4.4 Experiment : Apply to the Human Face Problem . . . . . 33 5 Conclusion and Future Work 38 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . 39

[1] B. Babenko, M.-H. Yang, and S. Belongie. Visual tracking with online
multiple instance learning. In Computer Vision and Pattern Recogni-
tion, 2009. CVPR 2009. IEEE Conference on, pages 983 –990, june
2009.
[2] T. F. Cox and M. A. A. Cox. Multidimensional Scaling, Second Edi-
tion. Chapman and Hall/CRC, 2 edition, Sept. 2000.
[3] W. Dai, Q. Yang, G.-R. Xue, and Y. Yu. Boosting for transfer learn-
ing. In Proceedings of the 24th international conference on Machine
learning, ICML ’07, pages 193–200, New York, NY, USA, 2007. ACM.
[4] O. Danielsson, B. Rasolzadeh, and S. Carlsson. Gated classifiers:
Boosting under high intra-class variation. In Computer Vision and
Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 2673
–2680, june 2011.
[5] E. W. Dijkstra. A Note on Two Problems in Connection with Graphs.
Numerical Mathematics, 1:269–271, 1959.
[6] R. Elwell and R. Polikar. Incremental learning of concept drift in
nonstationary environments. Neural Networks, IEEE Transactions
on, 22(10):1517 –1531, oct. 2011.
[7] Y. Freund and R. E. Schapire. A short introduction to boosting.
In In Proceedings of the Sixteenth International Joint Conference on
Artificial Intelligence, pages 1401–1406. Morgan Kaufmann, 1999.
[8] H. Grabner and H. Bischof. Online boosting and vision, 2006.
[9] H. Kleiman. The floyd-warshall algorithm, the ap and the tsp, part
ii. CoRR, math.CO/0112052, 2001.
[10] R. Klinkenberg. Learning drifting concepts: Example selection vs.
example weighting. Intell. Data Anal., 8(3):281–300, Aug. 2004.
[11] N. D. Lawrence and J. C. Platt. Learning to learn with the informa-
tive vector machine. In Proceedings of the twenty-first international
41conference on Machine learning, ICML ’04, pages 65–, New York,
NY, USA, 2004. ACM.
[12] S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Trans.
on Knowl. and Data Eng., 22:1345–1359, October 2010.
[13] M. Scholz and R. Klinkenberg. Boosting classifiers for drifting con-
cepts. Intell. Data Anal., 11(1):3–28, Jan. 2007.
[14] J. B. Tenenbaum, V. d. Silva, and J. C. Langford. A global geo-
metric framework for nonlinear dimensionality reduction. Science,
290(5500):2319–2323, 2000.
[15] K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning
for large margin nearest neighbor classification. In In NIPS. MIT
Press, 2006.
[16] Y. Yao and G. Doretto. Boosting for transfer learning with multiple
sources. In Computer Vision and Pattern Recognition (CVPR), 2010
IEEE Conference on, pages 1855 –1862, june 2010.
[17] I. Zliobaite. Learning under concept drift: an overview. CoRR,
abs/1010.4784, 2010.

QR CODE