Module pl.xml

XML LOM Utilities.

This implements some useful things on LOM documents, such as returned by lxp.lom.parse. In particular, it can convert LOM back into XML text, with optional pretty-printing control. It is based on stanza.lua from Prosody

> d = xml.parse "<nodes><node id='1'>alice</node></nodes>"
> = d
<nodes><node id='1'>alice</node></nodes>
> = xml.tostring(d,'','  ')
<nodes>
   <node id='1'>alice</node>
</nodes>

Can be used as a lightweight one-stop-shop for simple XML processing; a simple XML parser is included but the default is to use lxp.lom if it can be found.

 Prosody IM
 Copyright (C) 2008-2010 Matthew Wild
 Copyright (C) 2008-2010 Waqas Hussain--
 classic Lua XML parser by Roberto Ierusalimschy.
 modified to output LOM format.
 http://lua-users.org/wiki/LuaXml
 
See the Guide

Dependencies: pl.utils

Soft Dependencies: lxp.lom (fallback is to use basic Lua parser)

Functions

new (tag[, attr={}]) create a new document node.
parse (text_or_filename, is_file, use_basic) parse an XML document.
elem (tag, items) Create a Node with a set of children (text or Nodes) and attributes.
tags (list) given a list of names, return a number of element constructors.
Doc:addtag (tag[, attrs={}]) Adds a document Node, at current position.
Doc:text (text) Adds a text node, at current position.
Doc:up () Moves current position up one level.
Doc:reset () Resets current position to top level.
Doc:add_direct_child (child) Append a child to the current Node (ignoring current position).
Doc:add_child (child) Append a child at the current position (without changing position).
Doc:set_attribs (t) Set attributes of a document node.
Doc:set_attrib (a, v) Set a single attribute of a document node.
Doc:get_attribs () Gets the attributes of a document node.
Doc.subst (template, data) create a substituted copy of a document,
Doc:child_with_name (tag) Return the first child with a given tag name (non-recursive).
Doc:get_elements_with_name (tag[, dont_recurse=false]) Returns all elements in a document that have a given tag.
Doc:children () Iterator over all children of a document node, including text nodes.
Doc:first_childtag () Return the first child element of a node, if it exists.
Doc:matching_tags ([tag=nil[, xmlns=nil]]) Iterator that matches tag names, and a namespace (non-recursive).
Doc:childtags () Iterator over all child tags of a document node.
Doc:maptags (callback) Visit child Nodes of a node and call a function, possibly modifying the document.
xml_escape (str) Escapes a string for safe use in xml.
xml_unescape (str) Unescapes a string from xml.
tostring (doc[, b_ind[, t_ind[, a_ind[, xml_preface]]]]) Function to pretty-print an XML document.
Doc:tostring ([b_ind[, t_ind[, a_ind[, xml_preface="<?xml version='1.0'?>"]]]]) Method to pretty-print an XML document.
Doc:get_text () get the full text value of an element.
clone (doc[, strsubst]) Returns a copy of a document.
Doc:filter ([strsubst]) Returns a copy of a document.
compare (t1, t2) Compare two documents or elements.
is_tag (d) is this value a document element?
walk (doc, depth_first, operation) Calls a function recursively over Nodes in the document.
parsehtml (s) Parse a well-formed HTML file as a string.
basic_parse (s, all_text, html) Parse a simple XML document using a pure Lua parser based on Robero Ierusalimschy's original version.
Doc:match (pat) does something...


Functions

new (tag[, attr={}])
create a new document node.

Parameters:

  • tag string the tag name
  • attr table attributes (table of name-value pairs) (default {})

Returns:

    the Node object

See also:

Usage:

    local doc = xml.new("main", { hello = "world", answer = "42" })
    print(doc)  -->  <main hello='world' answer='42'/>
parse (text_or_filename, is_file, use_basic)
parse an XML document. By default, this uses lxp.lom.parse, but falls back to basic_parse, or if use_basic is truthy

Parameters:

  • text_or_filename file or string representation
  • is_file whether textorfile is a file name or not
  • use_basic do a basic parse

Returns:

  1. a parsed LOM document with the document metatatables set
  2. nil, error the error can either be a file error or a parse error
elem (tag, items)
Create a Node with a set of children (text or Nodes) and attributes.

Parameters:

  • tag string a tag name
  • items table or string either a single child (text or Node), or a table where the hash part is the attributes and the list part is the children (text or Nodes).

Returns:

    the new Node

See also:

Usage:

    local doc = xml.elem("top", "hello world")                -- <top>hello world</top>
    local doc = xml.elem("main", xml.new("child"))            -- <main><child/></main>
    local doc = xml.elem("main", { "this ", "is ", "nice" })  -- <main>this is nice</main>
    local doc = xml.elem("main", { xml.new "this",
                                   xml.new "is",
                                   xml.new "nice" })          -- <main><this/><is/><nice/></main>
    local doc = xml.elem("main", { hello = "world" })         -- <main hello='world'/>
    local doc = xml.elem("main", {
      "prefix",
      xml.elem("child", { "this ", "is ", "nice"}),
      "postfix",
      attrib = "value"
    })   -- <main attrib='value'>prefix<child>this is nice</child>postfix</main>"
tags (list)
given a list of names, return a number of element constructors. If passing a comma-separated string, then whitespace surrounding the values will be stripped.

The returned constructor functions are a shortcut to xml.elem where you no longer provide the tag-name, but only the items table.

Parameters:

  • list string or table a list of names, or a comma-separated string.

Returns:

    (multiple) constructor functions; function(items). For the items parameter see xml.elem.

See also:

Usage:

    local new_parent, new_child = xml.tags 'mom, kid'
    doc = new_parent {new_child 'Bob', new_child 'Annie'}
    -- <mom><kid>Bob</kid><kid>Annie</kid></mom>
Doc:addtag (tag[, attrs={}])
Adds a document Node, at current position. This updates the last inserted position to the new Node.

Parameters:

  • tag string the tag name
  • attrs table attributes (table of name-value pairs) (default {})

Returns:

    the current node (self)

Usage:

    local doc = xml.new("main")
    doc:addtag("penlight", { hello = "world"})
    doc:addtag("expat")  -- added to 'penlight' since position moved
    print(doc)  -->  <main><penlight hello='world'><expat/></penlight></main>
Doc:text (text)
Adds a text node, at current position.

Parameters:

Returns:

    the current node (self)

Usage:

    local doc = xml.new("main")
    doc:text("penlight")
    doc:text("expat")
    print(doc)  -->  <main><penlightexpat</main>
Doc:up ()
Moves current position up one level.

Returns:

    the current node (self)
Doc:reset ()
Resets current position to top level. Resets to the self node.

Returns:

    the current node (self)
Doc:add_direct_child (child)
Append a child to the current Node (ignoring current position).

Parameters:

  • child a child node (either text or a document)

Returns:

    the current node (self)

Usage:

    local doc = xml.new("main")
    doc:add_direct_child("dog")
    doc:add_direct_child(xml.new("child"))
    doc:add_direct_child("cat")
    print(doc)  -->  <main>dog<child/>cat</main>
Doc:add_child (child)
Append a child at the current position (without changing position).

Parameters:

  • child a child node (either text or a document)

Returns:

    the current node (self)

Usage:

    local doc = xml.new("main")
    doc:addtag("one")
    doc:add_child(xml.new("item1"))
    doc:add_child(xml.new("item2"))
    doc:add_child(xml.new("item3"))
    print(doc)  -->  <main><one><item1/><item2/><item3/></one></main>
Doc:set_attribs (t)
Set attributes of a document node. Will add/overwrite values, but will not remove existing ones. Operates on the Node itself, will not take position into account.

Parameters:

  • t table a table containing attribute/value pairs

Returns:

    the current node (self)
Doc:set_attrib (a, v)
Set a single attribute of a document node. Operates on the Node itself, will not take position into account.

Parameters:

  • a attribute
  • v its value, pass in nil to delete the attribute

Returns:

    the current node (self)
Doc:get_attribs ()
Gets the attributes of a document node. Operates on the Node itself, will not take position into account.

Returns:

    table with attributes (attribute/value pairs)
Doc.subst (template, data)
create a substituted copy of a document,

Parameters:

  • template may be a document or a string representation which will be parsed and cached
  • data a table of name-value pairs or a list of such tables

Returns:

    an XML document
Doc:child_with_name (tag)
Return the first child with a given tag name (non-recursive).

Parameters:

  • tag the tag name

Returns:

    the child Node found or nil if not found
Doc:get_elements_with_name (tag[, dont_recurse=false])
Returns all elements in a document that have a given tag.

Parameters:

  • tag string a tag name
  • dont_recurse boolean optionally only return the immediate children with this tag name (default false)

Returns:

    a list of elements found, list will be empty if none was found.
Doc:children ()
Iterator over all children of a document node, including text nodes. This function is not recursive, so returns only direct child nodes.

Returns:

    iterator that returns a single Node per iteration.
Doc:first_childtag ()
Return the first child element of a node, if it exists. This will skip text nodes.

Returns:

    first child Node or nil if there is none.
Doc:matching_tags ([tag=nil[, xmlns=nil]])
Iterator that matches tag names, and a namespace (non-recursive).

Parameters:

  • tag string tag names to return. Returns all tags if not provided. (default nil)
  • xmlns string the namespace value ('xmlns' attribute) to return. If not provided will match all namespaces. (default nil)

Returns:

    iterator that returns a single Node per iteration.
Doc:childtags ()
Iterator over all child tags of a document node. This will skip over text nodes.

Returns:

    iterator that returns a single Node per iteration.
Doc:maptags (callback)
Visit child Nodes of a node and call a function, possibly modifying the document. Text elements will be skipped. This is not recursive, so only direct children will be passed.

Parameters:

  • callback function a function with signature function(node), passed the node. The element will be updated with the returned value, or deleted if it returns nil.
xml_escape (str)
Escapes a string for safe use in xml. Handles quotes(single+double), less-than, greater-than, and ampersand.

Parameters:

  • str string string value to escape

Returns:

    escaped string

Usage:

    local esc = xml.xml_escape([["'<>&]])  --> "&quot;&apos;&lt;&gt;&amp;"
xml_unescape (str)
Unescapes a string from xml. Handles quotes(single+double), less-than, greater-than, and ampersand.

Parameters:

  • str string string value to unescape

Returns:

    unescaped string

Usage:

    local unesc = xml.xml_escape("&quot;&apos;&lt;&gt;&amp;")  --> [["'<>&]]
tostring (doc[, b_ind[, t_ind[, a_ind[, xml_preface]]]])
Function to pretty-print an XML document.

Parameters:

  • doc an XML document
  • b_ind string or int an initial block-indent (required when t_ind is set) (optional)
  • t_ind string or int an tag-indent for each level (required when a_ind is set) (optional)
  • a_ind string or int if given, indent each attribute pair and put on a separate line (optional)
  • xml_preface string or bool force prefacing with default or custom , if truthy then &lt;?xml version='1.0'?&gt; will be used as default. (optional)

Returns:

    a string representation

See also:

Doc:tostring ([b_ind[, t_ind[, a_ind[, xml_preface="<?xml version='1.0'?>"]]]])
Method to pretty-print an XML document. Invokes xml.tostring.

Parameters:

  • b_ind string or int an initial indent (required when t_ind is set) (optional)
  • t_ind string or int an indent for each level (required when a_ind is set) (optional)
  • a_ind string or int if given, indent each attribute pair and put on a separate line (optional)
  • xml_preface string force prefacing with default or custom (default "<?xml version='1.0'?>")

Returns:

    a string representation

See also:

Doc:get_text ()
get the full text value of an element.

Returns:

    a single string with all text elements concatenated

Usage:

    local doc = xml.new("main")
    doc:text("one")
    doc:add_child(xml.elem "two")
    doc:text("three")
    
    local t = doc:get_text()    -->  "onethree"
clone (doc[, strsubst])

Returns a copy of a document. The strsubst parameter is a callback with signature function(object, kind, parent).

Param kind has the following values, and parameters:

  • "*TAG": object is the tag-name, parent is the Node object. Returns the new tag name.

  • "*TEXT": object is the text-element, parent is the Node object. Returns the new text value.

  • other strings not prefixed with *: kind is the attribute name, object is the attribute value, parent is the Node object. Returns the new attribute value.

Parameters:

  • doc Node or string a Node object or string (text node)
  • strsubst function an optional function for handling string copying which could do substitution, etc. (optional)

Returns:

    copy of the document

See also:

Doc:filter ([strsubst])
Returns a copy of a document. This is the method version of xml.clone.

Parameters:

  • strsubst function an optional function for handling string copying (optional)

See also:

compare (t1, t2)
Compare two documents or elements. Equality is based on tag, child nodes (text and tags), attributes and order of those (order only fails if both are given, and not equal).

Parameters:

  • t1 Node or string a Node object or string (text node)
  • t2 Node or string a Node object or string (text node)

Returns:

    boolean true when the Nodes are equal.
is_tag (d)
is this value a document element?

Parameters:

  • d any value

Returns:

    boolean true if it is a table with property tag being a string value.
walk (doc, depth_first, operation)
Calls a function recursively over Nodes in the document. Will only call on tags, it will skip text nodes. The function signature for operation is function(tag_name, Node).

Parameters:

  • doc Node or string a Node object or string (text node)
  • depth_first boolean visit child nodes first, then the current node
  • operation function a function which will receive the current tag name and current node.
parsehtml (s)
Parse a well-formed HTML file as a string. Tags are case-insensitive, DOCTYPE is ignored, and empty elements can be .. empty.

Parameters:

  • s the HTML
basic_parse (s, all_text, html)
Parse a simple XML document using a pure Lua parser based on Robero Ierusalimschy's original version.

Parameters:

  • s the XML document to be parsed.
  • all_text if true, preserves all whitespace. Otherwise only text containing non-whitespace is included.
  • html if true, uses relaxed HTML rules for parsing
Doc:match (pat)
does something...

Parameters:

  • pat
generated by LDoc 1.5.0