簡易檢索 / 詳目顯示

研究生: 劉麗貞
Li-zhen Liu
論文名稱: 可擴展標示查詢語言之串流查詢機制
Toward a Streaming Evaluation Model for XPath
指導教授: 李漢銘
Hahn-ming Lee
莊庭瑞
Tyng-ruey Chuang
口試委員: 鮑興國
Hsing-kuo Pao
何建明
Jan-ming Ho
蔡明祺
Mi-ching Tsai
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 92
中文關鍵詞: XML 路徑語言串流
外文關鍵詞: XPath, streaming
相關次數: 點閱:186下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 自從一九九八年W3C公佈可擴展標示語言(eXtensible Markup Language, XML)的語法標準後,XML即被廣泛的應用在各種電子資料交換之上,XML文件資料量也隨之快速成長。當一份XML文件之資料量遠超過記憶體所能負載時,採用文件物件模型(Document Object Model, DOM)的查詢方式顯然勢不可行。
    本論文針對此類大資料量之XML文件提出一種串流路徑查詢機制,可以同時支援包括正向軸(forward axis)之子軸(child)、後代軸(descendant)、後繼軸(following)、後繼兄弟軸(following-sibling)以及逆向軸(backward axis)之母軸(parent)、先代軸(ancestor)、前驅軸(preceding)、前驅兄弟軸(preceding-sibling)的查詢功能。主要著重將一個XML 路徑語言(XPath)表達式轉換成適用於串流查詢的抽象語意樹,然而,並非所有轉換後的抽象語意樹皆適用於串流查詢,因此,在本文裡,我們也將說明如何將一個轉換後但不適用於串流查詢之抽象語意樹拆解成適用於串流查詢的語意樹。再將抽象語意樹視為計算模組(computation module), 並從串流資料剖析來的節點一一對映於抽象語意樹進行計算處理以求得查詢結果。從實驗的結果得知,本研究所提方法確實能在有限的記憶體與寶貴的時間兩者中取得平衡並提升查詢效率。


    Extensible Markup Language (XML), a markup language proposed by World Wide Web Consortium (W3C) in 1996, has become a standard format for the exchange of information among various applications on the Internet. With the popularity of XML, the amount of XML document is growing rapidly, when the document which is multiple times larger than available main memory, it will be failed to use DOM based approach.

    We present a streaming evaluation model for evaluating a XPath expression (involving forward axis, such as child, descendant, following, following-sibling and backward axes, such as parent, ancestor, preceding, and preceding-sibling axes) on a XML document. Besides, we propose an approach which can convert a XPath expression into a corresponding abstract syntax tree suited for evaluating on streaming XML document. Of course, not all XPath expressions can be evaluated this way. We show how to analyze a XPath expression to tell if and how it can be streaming evaluated, and if cannot, how to break the evaluation into multiple passes where each is in streaming manner. Furthermore, we view this abstract syntax tree as computation module against very large XML document. In the most practical case, the processing time is linear to the XML document size and memory is linear to the depth of XML document excluding backward axes. Experiments showed that proposed approach with a good efficiency on execution time within as little memory.

    Abstract...............................................II Acknowledgement........................................IV Content.................................................V List of Figures........................................IX List of Tables.........................................XI Chapter 1 Introduction...................................1 1.1 Motivation.....................................2 1.2 Challenge in Streaming XPath Evaluation........3 1.3 Goals..........................................4 1.4 Outline of the Thesis..........................5 Chapter 2 Background.....................................6 2.1 XML Document Model .............................6 2.2 XPath Language.................................8 2.3 Related Work..................................12 Chapter 3 Toward a Streaming Evaluation Model...........14 3.1 Concept of TSEM System........................14 3.2 System Overview...............................16 3.3 XML Document Indexing.........................17 3.4 Abstract Syntax Tree Construction.............18 3.4.1 Basic Operators..................................19 3.4.1.1 Forward Operators..............................19 3.4.1.2 Backward Operators.............................20 3.4.1.3 Summary........................................21 3.4.2 Streaming Characteristics of Operators...........22 3.4.2.1 Child(X,Y) .....................................22 3.4.2.2 Parent(X,Y)....................................23 3.4.2.3 Descendant(X,Y)................................23 3.4.2.4 Ancestor(X,Y)..................................24 3.4.2.5 Following(X,Y).................................25 3.4.2.6 Preceding(X,Y).................................26 3.4.2.7 Following-sibling(X,Y).........................27 3.4.2.8 Preceding-sibling(X,Y).........................27 3.4.2.9 Summary........................................28 3.4.3 Mapping a XPath Expression to Abstract Syntax Tree .................................................29 3.5 Decomposition of Abstract Syntax Tree.........34 3.6 Computation Module.................................36 3.6.1 Forward Axis of Abstract Syntax Tree.............37 3.6.2 Backward Axis of Abstract Syntax Tree............40 Chapter 4 Experiments...................................42 4.1 Experiment Design.............................42 4.2 Datasets and XPath Expression.................43 4.3 Experiment Results ............................45 4.3.1 Streaming Model vs. Non-Streaming Model..........45 4.3.1.1 Query Execution Time...........................45 4.3.1.2 Scalability of Query Processing Time...........47 4.3.1.3 Memory Usage...................................47 4.3.1.4 Scalability of Memory Usage....................49 4.3.2 Streaming Model..................................49 4.3.2.1 Forward Axis Processing........................49 4.3.2.2 Path Processing with Predicate.................53 4.3.2.3 Backward Axis Processing.......................57 4.3.3 Summary..........................................60 Chapter 5 Conclusion and Further Work ...................61 5.1 Conclusions...................................61 5.2 Further Work..................................62 References.............................................63 Appendix A.............................................69 List of Algorithm......................................69 Vita...................................................75

    [1] Extensible Markup Language (XML) 1.0 (Fourth Edition)
    http://www.w3.org/TR/xml/
    [2] XQuery 1.0: An XML Query Language
    http://www.w3.org/TR/xquery/
    [3] XML Path Language (XPath) Version 1.0
    http://www.w3.org/TR/xpath/
    [4] XSL Transformations (XSLT) Version 1.0
    http://www.w3.org/TR/xslt/
    [5] XML Pointer Language (XPointer) Version 1.0
    http://www.w3.org/TR/WD-xptr/
    [6] Introduction to GML Geography Markup Language
    http://www.w3.org/Mobile/posdep/GMLIntroduction.html

    [7] ebXML ENABLING A GLOBAL ELECTRONIC MARKET
    http://www.ebxml.org/
    [8] News Markup Language
    http://www.newsml.org/
    [9] Document Object Model (DOM)
    http://www.w3.org/DOM/
    [10] About SAX
    http://www.saxproject.org/
    [11] Dan Olteanu , Holger Meuss , Tim Furche , François Bry, “XPath: Looking Forward,” in Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers, pp. 109-127, 2002.
    [12] Mehmet Altinel, Michael J. Franklin, “Efficient Filtering of XML Documents for Selective Dissemination of Information,” in Proceedings of the 26th International Conference on Very Large Data Bases, pp. 53-64, 2000.
    [13] Yanlei Diao, Peter Fischer, Michael J. Franklin, Raymond To, ”YFilter: “Efficient and Scalable Filtering of XML Documents,” in Proceedings of the 18th International Conference on Data Engineering, pp. 341, 2002.

    [14] Mehmet Altinel , Michael J. Franklin, “Efficient Filtering of XML Documents for Selective Dissemination of Information,“ in Proceedings of the 26th International Conference on Very Large Data Bases, pp. 53-64, 2000.
    [15] Todd J. Green, Gerome Miklau, Makoto Onizuka, Dan Suciu, “Processing XML Streams with Deterministic Automata,” in Proceedings of the 9th International Conference on Database Theory, pp.173-189, 2003.
    [16] Makoto Onizuka, “Light-weight XPath processing of XML stream with deterministic automata,” in Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 342 - 349, 2003.
    [17] Ashish Kumar Gupta , Dan Suciu, “Stream processing of XPath queries with predicates,” in Proceedings of the ACM SIGMOD international conference on Management of data, pp. 419-430, 2003.
    [18] Danny Chen , Raymond K. Wong, “Optimizing the lazy DFA approach for XML stream processing,” in Proceedings of the fifteenth Australasian database conference, pp. 131-140, 2004.
    [19] Yang Weidong, Lee Daewook, Gu Ning, Shi Baile, Lee Sukho,” LeoXSS: An Efficient XML Stream System for Processing Complex XPaths,” in Proceedings of the Sixth IEEE International Conference on Computer and Information Technology, pp. 27, 2006.
    [20] Keerati Jittrawong, Raymond K. Wong,” Optimizing XPath queries on streaming XML data,” ACM International Conference Proceeding Series, Vol. 242, pp. Pages: 73 – 82, 2007.
    [21] Feng Peng, Sudarshan S. Chawathe, “XPath queries on streaming data,” in
    Proceedings of the 2003 ACM SIGMOD International Conference on
    Management of data, pp. 431 – 442, 2003.
    [22] Feng Peng, Sudarshan S. Chawathe, “XSQ: A Streaming XPath Engine,” ACM
    Transactions on Database Systems, Vol. 30, pp.577-623, 2005.
    [23] Dan. Olteanu, T. Kiesling, and F. Bry. “An Evaluation of Regular Path Expressions with Qualifiers against XML Streams,” in Proceedings of 19th International Conference on Data Engineering, pp. 702-704, 2002.
    [24] Dan Olteanu, Tim Furche, François Bry, “An efficient single-pass query evaluator for XML data streams,” in Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 627 - 631, 2004.
    [25] Dan Olteanu, Tim Furche, and Francois Bry ,”Evaluating complex queries against XML streams with polynomial combined complexity,” in Proceedings of The British National Conference on Databases, pp. 31-44, 2004.
    [26] Fran_cois Bry, Fatih Coskun, Serap Durmaz, Tim Furche, Dan Olteanu, and Markus Spannagel,“ The XML stream query processor SPEX,” in International Conference on Data Engineering, pp.1120-1121, 2005.
    [27] Dan Olteanu, “SPEX: Streamed and progressive evaluation of XPath,” IEEE
    Transactions on Knowledge and Data Engineering, Vol. 19, pp. 934-949, 2007.

    [28] Yi Chen, Susan B. Davidson, Yifeng Zheng, “ViteX: a Streaming XPath Processing System,” In Proceedings of the 21th International Conference on Data Engineering, pp. 1118-1119, 2005.
    [29] Yi Chen, Susan B. Davidson, Yifeng Zheng, “An Efficient XPath Query Processor for XML Streams,” in Proceedings of the 22nd International Conference on Data Engineering, pp. 79, 2006.
    [30] Georg Gottlob, Christoph Koch, Reinhard Pichler, “Efficient algorithms for processing XPath queries,” in Proceedings of the 28th International Conference on Very Large Data Bases. pp. 95-106, 2002.
    [31] Georg Gottlob, Christoph Koch, Reinhard Pichler, “Efficient algorithms for processing XPath queries,” ACM Transactions on Database Systems, Vol. 30, pp. 444-491, 2005.
    [32] Shurug Al-Khalifa, H. V. Jagadish, Jignesh M. Patel, Yuqing Wu, Nick Koudas, Divesh Srivastava, “Structural Joins: A Primitive for Efficient XML Query Pattern Matching,” in Proceedings of the 18th International Conference on Data Engineering, pp. 141, 2002.
    [33] Nicolas Bruno, Nick Koudas, Divesh Srivastava, “Holistic twig joins: optimal XML pattern matching,” in Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 310-321, 2002.
    [34] Kim, S.M,Yoo, S.I.,Hong, E.,Kim, T.G.,Kim, I.K. “A document object modeling method to retrieve data from a very large XML document”, in Proceedings of the 2007 ACM Symposium on Document Engineering, pp. 59-68, 2007.
    [35] SAXON
    http://saxon.sourceforge.net/
    [36] Albrecht Schmidt, Florian. Waas, Martin. Kersten, Michael J. Carey, Ioana. Manolescu, and Ralph. Busse, “XMark: A benchmark for XML data management,” in Proceedings of International Conference on Very Large Databases, pp. 974-985, 2002.
    [37] Massimo. Franceschet. XPathMark - an xpath benchmark for xmark. Research report, 2005.

    QR CODE