Introduction
Lua Object Model (LOM) is a representation of XML elements through Lua data types. Currently it is not supposed to be 100% complete, but simple. LuaExpat provides an implementation of LOM that gets an XML document and transforms it to a Lua table.
Characteristics
The model represents each XML element as a Lua table. A LOM table has three special characteristics:
- a special field called
tag
that holds the element's name; - an optional field called
attr
that stores the element's attributes; and - the element's children are stored at the array-part of the table. A child could be an ordinary string or another XML element that will be represented by a Lua table following these same rules.
The special field attr
is a Lua table that
stores the XML element's attributes as pairs
<key>=<value>. To assure an order (if
necessary), the sequence of keys could be placed at the
array-part of this same table.
Functions
- lom.parse(string|function|table|file[, opts])
- Parses the input into the LOM table format and returns it. The input can be;
- string: the entire XML document as a string
- function: an iterator that returns the next chunk of the XML document on each call, and returns nil when finished
- table: an array like table that contains the chunks that combined make up the XML document
- file: an open file handle from which the XML document will
be read line-by-line, using
read()
. Note: the file will not be closed when done.
- separator (string): the namespace separator character to use, setting this will enable namespace aware parsing.
- threat (table): a threat protection options table. If provided the threat protection parser will be used instead of the regular lxp parser.
nil, err, line, col, pos
. - lom.find_elem(node, tag)
- Traverses the tree recursively, and returns the first element that matches the tag. Parameter tag (string) is the tag name to look for. The node table can be the result from the parse function, or any of its children.
- lom.list_children(node[, tag])
- Iterator returning all child tags of a node (non-recursive). It will only children that are tags, and will skip text-nodes. The node table can be the result from the parse function, or any of its children. If the optional parameter tag (string) is given, then the iterator will only return tags that match the tag name.
Examples
For a simple string like
s = [[<abc a1="A1" a2="A2">inside tag `abc'</abc>]]
A call like
tab = lxp.lom.parse (s))
Would result in a table equivalent to
tab = { ["attr"] = { [1] = "a1", [2] = "a2", ["a2"] = "A2", ["a1"] = "A1", }, [1] = "inside tag `abc'", ["tag"] = "abc", }
Now an example with an element nested inside another element
tab = lxp.lom.parse( [[<qwerty q1="q1" q2="q2"> <asdf>some text</asdf> </qwerty>]] )
The result would have been a table equivalent to
tab = { [1] = "\ ", [2] = { ["attr"] = { }, [1] = "some text", ["tag"] = "asdf", }, ["attr"] = { [1] = "q1", [2] = "q2", ["q2"] = "q2", ["q1"] = "q1", }, [3] = "\ ", ["tag"] = "qwerty", }
Note that even the new-line and tab characters are stored on the table.