Module pl.xml
XML LOM Utilities.
This implements some useful things on LOM documents, such as returned by lxp.lom.parse
.
In particular, it can convert LOM back into XML text, with optional pretty-printing control.
It is based on stanza.lua from Prosody
> d = xml.parse "<nodes><node id='1'>alice</node></nodes>" > = d <nodes><node id='1'>alice</node></nodes> > = xml.tostring(d,'',' ') <nodes> <node id='1'>alice</node> </nodes>
Can be used as a lightweight one-stop-shop for simple XML processing; a simple XML parser is included
but the default is to use lxp.lom
if it can be found.
Prosody IM Copyright (C) 2008-2010 Matthew Wild Copyright (C) 2008-2010 Waqas Hussain-- classic Lua XML parser by Roberto Ierusalimschy. modified to output LOM format. http://lua-users.org/wiki/LuaXmlSee the Guide
Dependencies: pl.utils
Soft Dependencies: lxp.lom
(fallback is to use basic Lua parser)
Functions
new (tag[, attr={}]) | create a new document node. |
parse (text_or_filename, is_file, use_basic) | parse an XML document. |
elem (tag, items) | Create a Node with a set of children (text or Nodes) and attributes. |
tags (list) | given a list of names, return a number of element constructors. |
Doc:addtag (tag[, attrs={}]) | Adds a document Node, at current position. |
Doc:text (text) | Adds a text node, at current position. |
Doc:up () | Moves current position up one level. |
Doc:reset () | Resets current position to top level. |
Doc:add_direct_child (child) | Append a child to the current Node (ignoring current position). |
Doc:add_child (child) | Append a child at the current position (without changing position). |
Doc:set_attribs (t) | Set attributes of a document node. |
Doc:set_attrib (a, v) | Set a single attribute of a document node. |
Doc:get_attribs () | Gets the attributes of a document node. |
Doc.subst (template, data) | create a substituted copy of a document, |
Doc:child_with_name (tag) | Return the first child with a given tag name (non-recursive). |
Doc:get_elements_with_name (tag[, dont_recurse=false]) | Returns all elements in a document that have a given tag. |
Doc:children () | Iterator over all children of a document node, including text nodes. |
Doc:first_childtag () | Return the first child element of a node, if it exists. |
Doc:matching_tags ([tag=nil[, xmlns=nil]]) | Iterator that matches tag names, and a namespace (non-recursive). |
Doc:childtags () | Iterator over all child tags of a document node. |
Doc:maptags (callback) | Visit child Nodes of a node and call a function, possibly modifying the document. |
xml_escape (str) | Escapes a string for safe use in xml. |
xml_unescape (str) | Unescapes a string from xml. |
tostring (doc[, b_ind[, t_ind[, a_ind[, xml_preface]]]]) | Function to pretty-print an XML document. |
Doc:tostring ([b_ind[, t_ind[, a_ind[, xml_preface="<?xml version='1.0'?>"]]]]) | Method to pretty-print an XML document. |
Doc:get_text () | get the full text value of an element. |
clone (doc[, strsubst]) | Returns a copy of a document. |
Doc:filter ([strsubst]) | Returns a copy of a document. |
compare (t1, t2) | Compare two documents or elements. |
is_tag (d) | is this value a document element? |
walk (doc, depth_first, operation) | Calls a function recursively over Nodes in the document. |
parsehtml (s) | Parse a well-formed HTML file as a string. |
basic_parse (s, all_text, html) | Parse a simple XML document using a pure Lua parser based on Robero Ierusalimschy's original version. |
Doc:match (pat) | does something... |
Functions
- new (tag[, attr={}])
-
create a new document node.
Parameters:
Returns:
-
the Node object
See also:
Usage:
local doc = xml.new("main", { hello = "world", answer = "42" }) print(doc) --> <main hello='world' answer='42'/>
- parse (text_or_filename, is_file, use_basic)
-
parse an XML document. By default, this uses lxp.lom.parse, but
falls back to basic_parse, or if
use_basic
is truthyParameters:
- text_or_filename file or string representation
- is_file whether textorfile is a file name or not
- use_basic do a basic parse
Returns:
- a parsed LOM document with the document metatatables set
- nil, error the error can either be a file error or a parse error
- elem (tag, items)
-
Create a Node with a set of children (text or Nodes) and attributes.
Parameters:
- tag string a tag name
- items table or string either a single child (text or Node), or a table where the hash part is the attributes and the list part is the children (text or Nodes).
Returns:
-
the new Node
See also:
Usage:
local doc = xml.elem("top", "hello world") -- <top>hello world</top> local doc = xml.elem("main", xml.new("child")) -- <main><child/></main> local doc = xml.elem("main", { "this ", "is ", "nice" }) -- <main>this is nice</main> local doc = xml.elem("main", { xml.new "this", xml.new "is", xml.new "nice" }) -- <main><this/><is/><nice/></main> local doc = xml.elem("main", { hello = "world" }) -- <main hello='world'/> local doc = xml.elem("main", { "prefix", xml.elem("child", { "this ", "is ", "nice"}), "postfix", attrib = "value" }) -- <main attrib='value'>prefix<child>this is nice</child>postfix</main>"
- tags (list)
-
given a list of names, return a number of element constructors.
If passing a comma-separated string, then whitespace surrounding the values
will be stripped.
The returned constructor functions are a shortcut to xml.elem where you no longer provide the tag-name, but only the
items
table.Parameters:
Returns:
-
(multiple) constructor functions;
function(items)
. For theitems
parameter see xml.elem.See also:
Usage:
local new_parent, new_child = xml.tags 'mom, kid' doc = new_parent {new_child 'Bob', new_child 'Annie'} -- <mom><kid>Bob</kid><kid>Annie</kid></mom>
- Doc:addtag (tag[, attrs={}])
-
Adds a document Node, at current position.
This updates the last inserted position to the new Node.
Parameters:
Returns:
-
the current node (
self
)Usage:
local doc = xml.new("main") doc:addtag("penlight", { hello = "world"}) doc:addtag("expat") -- added to 'penlight' since position moved print(doc) --> <main><penlight hello='world'><expat/></penlight></main>
- Doc:text (text)
-
Adds a text node, at current position.
Parameters:
- text string a string
Returns:
-
the current node (
self
)Usage:
local doc = xml.new("main") doc:text("penlight") doc:text("expat") print(doc) --> <main><penlightexpat</main>
- Doc:up ()
-
Moves current position up one level.
Returns:
-
the current node (
self
) - Doc:reset ()
-
Resets current position to top level.
Resets to the
self
node.Returns:
-
the current node (
self
) - Doc:add_direct_child (child)
-
Append a child to the current Node (ignoring current position).
Parameters:
- child a child node (either text or a document)
Returns:
-
the current node (
self
)Usage:
local doc = xml.new("main") doc:add_direct_child("dog") doc:add_direct_child(xml.new("child")) doc:add_direct_child("cat") print(doc) --> <main>dog<child/>cat</main>
- Doc:add_child (child)
-
Append a child at the current position (without changing position).
Parameters:
- child a child node (either text or a document)
Returns:
-
the current node (
self
)Usage:
local doc = xml.new("main") doc:addtag("one") doc:add_child(xml.new("item1")) doc:add_child(xml.new("item2")) doc:add_child(xml.new("item3")) print(doc) --> <main><one><item1/><item2/><item3/></one></main>
- Doc:set_attribs (t)
-
Set attributes of a document node.
Will add/overwrite values, but will not remove existing ones.
Operates on the Node itself, will not take position into account.
Parameters:
- t table a table containing attribute/value pairs
Returns:
-
the current node (
self
) - Doc:set_attrib (a, v)
-
Set a single attribute of a document node.
Operates on the Node itself, will not take position into account.
Parameters:
- a attribute
- v
its value, pass in
nil
to delete the attribute
Returns:
-
the current node (
self
) - Doc:get_attribs ()
-
Gets the attributes of a document node.
Operates on the Node itself, will not take position into account.
Returns:
-
table with attributes (attribute/value pairs)
- Doc.subst (template, data)
-
create a substituted copy of a document,
Parameters:
- template may be a document or a string representation which will be parsed and cached
- data a table of name-value pairs or a list of such tables
Returns:
-
an XML document
- Doc:child_with_name (tag)
-
Return the first child with a given tag name (non-recursive).
Parameters:
- tag the tag name
Returns:
-
the child Node found or
nil
if not found - Doc:get_elements_with_name (tag[, dont_recurse=false])
-
Returns all elements in a document that have a given tag.
Parameters:
- tag string a tag name
- dont_recurse boolean optionally only return the immediate children with this tag name (default false)
Returns:
-
a list of elements found, list will be empty if none was found.
- Doc:children ()
-
Iterator over all children of a document node, including text nodes.
This function is not recursive, so returns only direct child nodes.
Returns:
-
iterator that returns a single Node per iteration.
- Doc:first_childtag ()
-
Return the first child element of a node, if it exists.
This will skip text nodes.
Returns:
-
first child Node or
nil
if there is none. - Doc:matching_tags ([tag=nil[, xmlns=nil]])
-
Iterator that matches tag names, and a namespace (non-recursive).
Parameters:
- tag string tag names to return. Returns all tags if not provided. (default nil)
- xmlns string the namespace value ('xmlns' attribute) to return. If not provided will match all namespaces. (default nil)
Returns:
-
iterator that returns a single Node per iteration.
- Doc:childtags ()
-
Iterator over all child tags of a document node. This will skip over
text nodes.
Returns:
-
iterator that returns a single Node per iteration.
- Doc:maptags (callback)
-
Visit child Nodes of a node and call a function, possibly modifying the document.
Text elements will be skipped.
This is not recursive, so only direct children will be passed.
Parameters:
- callback
function
a function with signature
function(node)
, passed the node. The element will be updated with the returned value, or deleted if it returnsnil
.
- callback
function
a function with signature
- xml_escape (str)
-
Escapes a string for safe use in xml.
Handles quotes(single+double), less-than, greater-than, and ampersand.
Parameters:
- str string string value to escape
Returns:
-
escaped string
Usage:
local esc = xml.xml_escape([["'<>&]]) --> ""'<>&"
- xml_unescape (str)
-
Unescapes a string from xml.
Handles quotes(single+double), less-than, greater-than, and ampersand.
Parameters:
- str string string value to unescape
Returns:
-
unescaped string
Usage:
local unesc = xml.xml_escape(""'<>&") --> [["'<>&]]
- tostring (doc[, b_ind[, t_ind[, a_ind[, xml_preface]]]])
-
Function to pretty-print an XML document.
Parameters:
- doc an XML document
- b_ind
string or int
an initial block-indent (required when
t_ind
is set) (optional) - t_ind
string or int
an tag-indent for each level (required when
a_ind
is set) (optional) - a_ind string or int if given, indent each attribute pair and put on a separate line (optional)
- xml_preface
string or bool
force prefacing with default or custom , if truthy then
<?xml version='1.0'?>
will be used as default. (optional)
Returns:
-
a string representation
See also:
- Doc:tostring ([b_ind[, t_ind[, a_ind[, xml_preface="<?xml version='1.0'?>"]]]])
-
Method to pretty-print an XML document.
Invokes xml.tostring.
Parameters:
- b_ind
string or int
an initial indent (required when
t_ind
is set) (optional) - t_ind
string or int
an indent for each level (required when
a_ind
is set) (optional) - a_ind string or int if given, indent each attribute pair and put on a separate line (optional)
- xml_preface string force prefacing with default or custom (default "<?xml version='1.0'?>")
Returns:
-
a string representation
See also:
- b_ind
string or int
an initial indent (required when
- Doc:get_text ()
-
get the full text value of an element.
Returns:
-
a single string with all text elements concatenated
Usage:
local doc = xml.new("main") doc:text("one") doc:add_child(xml.elem "two") doc:text("three") local t = doc:get_text() --> "onethree"
- clone (doc[, strsubst])
-
Returns a copy of a document. The
strsubst
parameter is a callback with signaturefunction(object, kind, parent)
.Param
kind
has the following values, and parameters:"*TAG"
:object
is the tag-name,parent
is the Node object. Returns the new tag name."*TEXT"
:object
is the text-element,parent
is the Node object. Returns the new text value.other strings not prefixed with
*
:kind
is the attribute name,object
is the attribute value,parent
is the Node object. Returns the new attribute value.
Parameters:
- doc Node or string a Node object or string (text node)
- strsubst function an optional function for handling string copying which could do substitution, etc. (optional)
Returns:
-
copy of the document
See also:
- Doc:filter ([strsubst])
-
Returns a copy of a document.
This is the method version of xml.clone.
Parameters:
- strsubst function an optional function for handling string copying (optional)
See also:
- compare (t1, t2)
-
Compare two documents or elements.
Equality is based on tag, child nodes (text and tags), attributes and order
of those (order only fails if both are given, and not equal).
Parameters:
- t1 Node or string a Node object or string (text node)
- t2 Node or string a Node object or string (text node)
Returns:
-
boolean
true
when the Nodes are equal. - is_tag (d)
-
is this value a document element?
Parameters:
- d any value
Returns:
-
boolean
true
if it is a table with propertytag
being a string value. - walk (doc, depth_first, operation)
-
Calls a function recursively over Nodes in the document.
Will only call on tags, it will skip text nodes.
The function signature for
operation
isfunction(tag_name, Node)
.Parameters:
- doc Node or string a Node object or string (text node)
- depth_first boolean visit child nodes first, then the current node
- operation function a function which will receive the current tag name and current node.
- parsehtml (s)
-
Parse a well-formed HTML file as a string.
Tags are case-insensitive, DOCTYPE is ignored, and empty elements can be .. empty.
Parameters:
- s the HTML
- basic_parse (s, all_text, html)
-
Parse a simple XML document using a pure Lua parser based on Robero Ierusalimschy's original version.
Parameters:
- s the XML document to be parsed.
- all_text if true, preserves all whitespace. Otherwise only text containing non-whitespace is included.
- html if true, uses relaxed HTML rules for parsing
- Doc:match (pat)
-
does something...
Parameters:
- pat