Tables and Arrays
Python-style Lists
One of the elegant things about Lua is that tables do the job of both lists and dicts (as called in Python) or vectors and maps, (as called in C++), and they do it efficiently. However, if we are dealing with 'tables with numerical indices' we may as well call them lists and look for operations which particularly make sense for lists. The Penlight List class was originally written by Nick Trout for Lua 5.0, and translated to 5.1 and extended by myself. It seemed that borrowing from Python was a good idea, and this eventually grew into Penlight.
Here is an example showing List in action; it redefines __tostring
, so that
it can print itself out more sensibly:
> List = require 'pl.List' --> automatic with require 'pl' <--- > l = List() > l:append(10) > l:append(20) > = l {10,20} > l:extend {30,40} {10,20,30,40} > l:insert(1,5) {5,10,20,30,40} > = l:pop() 40 > = l {5,10,20,30} > = l:index(30) 4 > = l:contains(30) true > = l:reverse() ---> note: doesn't make a copy! {30,20,10,5}
Although methods like sort
and reverse
operate in-place and change the list,
they do return the original list. This makes it possible to do method chaining,
like ls = ls:append(10):append(20):reverse():append(1)
. But (and this is an
important but) no extra copy is made, so ls
does not change identity. List
objects (like tables) are mutable, unlike strings. If you want a copy of a
list, then List(ls)
will do the job, i.e. it acts like a copy constructor.
However, if passed any other table, List will just set the metatable of the
table and not make a copy.
A particular feature of Python lists is slicing. This is fully supported in this version of List, except we use 1-based indexing. So List.slice works rather like string.sub:
> l = List {10,20,30,40} > = l:slice(1,1) ---> note: creates a new list! {10} > = l:slice(2,2) {20} > = l:slice(2,3) {20,30} > = l:slice(2,-2) {20,30} > = l:slice_assign(2,2,{21,22,23}) {10,21,22,23,30,40} > = l:chop(1,1) {21,22,23,30,40}
Functions like slice_assign
and chop
modify the list; the first is equivalent
to Pythonl[i1:i2] = seq
and the second to del l[i1:i2]
.
List objects are ultimately just Lua 'list-like' tables, but they have extra
operations defined on them, such as equality and concatention. For regular
tables, equality is only true if the two tables are identical objects, whereas
two lists are equal if they have the same contents, i.e. that l1[i]==l2[i]
for
all elements.
> l1 = List {1,2,3} > l2 = List {1,2,3} > = l1 == l2 true > = l1..l2 {1,2,3,1,2,3}
The List constructor can be passed a function. If so, it's assumed that this is an iterator function that can be repeatedly called to generate a sequence. One such function is io.lines; the following short, intense little script counts the number of lines in standard input:
-- linecount.lua require 'pl' ls = List(io.lines()) print(#ls)
List.iterate captures what List considers a sequence. In particular, it can also iterate over all 'characters' in a string:
> for ch in List.iterate 'help' do io.write(ch,' ') end h e l p >
Since the function iterate
is used internally by the List constructor,
strings can be made into lists of character strings very easily.
There are a number of operations that go beyond the standard Python methods. For instance, you can partition a list into a table of sublists using a function. In the simplest form, you use a predicate (a function returning a boolean value) to partition the list into two lists, one of elements matching and another of elements not matching. But you can use any function; if we use type then the keys will be the standard Lua type names.
> ls = List{1,2,3,4} > ops = require 'pl.operator' > ls:partition(function(x) return x > 2 end) {false={1,2},true={3,4}} > ls = List{'one',math.sin,List{1},10,20,List{1,2}} > ls:partition(type) {function={function: 00369110},string={one},number={10,20},table={{1},{1,2}}}
This is one List method which returns a table which is not a List. Bear in
mind that you can always call a List method on a plain table argument, so
List.partition(t,type)
works as expected. But these functions will only operate
on the array part of the table.
The 'nominal' type of the returned table is pl.Multimap
, which describes a mapping
between keys and multiple values. This does not mean that pl.Multimap
is automatically
loaded whenever you use partition
(or List for that matter); this is one of the
standard metatables which are only filled out when the appropriate module is loaded.
This allows tables to be tagged appropriately without causing excessive coupling.
Stacks occur everywhere in computing. List supports stack-like operations;
there is already pop
(remove and return last value) and append
acts like
push
(add a value to the end). push
is provided as an alias for append
, and
the other stack operation (size) is simply the size operator #
. Queues can
also be implemented; you use pop
to take values out of the queue, and put
to
insert a value at the beginning.
You may derive classes from List, and since the list-returning methods
are covariant, the result of slice
etc will return lists of the derived type,
not List. For instance, consider the specialization of a List type that contains
numbers in tests/test-list.lua
:
n1 = NA{10,20,30} n2 = NA{1,2,3} ns = n1 + 2*n2 asserteq(ns,{12,24,36}) min,max = ns:slice(1,2):minmax() asserteq(T(min,max),T(12,24)) asserteq(n1:normalize():sum(),1,1e-8)
Map and Set classes
The Map class exposes what Python would call a 'dict' interface, and accesses
the hash part of the table. The name 'Map' is used to emphasize the interface,
not the implementation; it is an object which maps keys onto values; m['alice']
or the equivalent m.alice
is the access operation. This class also provides
explicit set
and get
methods, which are trivial for regular maps but get
interesting when Map is subclassed. The other operation is update
, which
extends a map by copying the keys and values from another table, perhaps
overwriting existing keys:
> Map = require 'pl.Map' > m = Map{one=1,two=2} > m:update {three=3,four=4,two=20} > = m == M{one=1,two=20,three=3,four=4} true
The method values
returns a list of the values, and keys
returns a list of
the keys; there is no guarantee of order. getvalues
is given a list of keys and
returns a list of values associated with these keys:
> m = Map{one=1,two=2,three=3} > = m:getvalues {'one','three'} {1,3} > = m:getvalues(m:keys()) == m:values() true
When querying the value of a Map, it is best to use the get
method:
> print(m:get 'one', m:get 'two') 1 2
The reason is that m[key]
can be ambiguous; due to the current implementation,
m["get"]
will always succeed, because if a value is not present in the map, it
will be looked up in the Map metatable, which contains a method get
. There is
currently no simple solution to this annoying restriction.
There are some useful classes which inherit from Map. An OrderedMap behaves
like a Map but keeps its keys in order if you use its set
method to add keys
and values. Like all the 'container' classes in Penlight, it defines an iter
method for iterating over its values; this will return the keys and values in the
order of insertion; the keys
and values
methods likewise.
A MultiMap allows multiple values to be associated with a given key. So set
(as before) takes a key and a value, but calling it with the same key and a
different value does not overwrite but adds a new value. get
(or using []
)
will return a list of values.
A Set can be seen as a special kind of Map, where all the values are true
,
the keys are the values, and the order is not important. So in this case
Set.values is defined to return a list of the keys. Sets can display
themselves, and the basic operations like union
(+
) and intersection
(*
)
are defined.
> Set = require 'pl.Set' > = Set{'one','two'} == Set{'two','one'} true > fruit = Set{'apple','banana','orange'} > = fruit['banana'] true > = fruit['hazelnut'] nil > = fruit:values() {apple,orange,banana} > colours = Set{'red','orange','green','blue'} > = fruit,colours [apple,orange,banana] [blue,green,orange,red] > = fruit+colours [blue,green,apple,red,orange,banana] > = fruit*colours [orange]
There are also the functions Set.difference and Set.symmetric_difference
. The
first answers the question 'what fruits are not colours?' and the second 'what
are fruits and colours but not both?'
> = fruit - colours [apple,banana] > = fruit ^ colours [blue,green,apple,red,banana]
Adding elements to a set is simply fruit['peach'] = true
and removing is
fruit['apple'] = nil
. To make this simplicity work properly, the Set class has no
methods - either you use the operator forms or explicitly use Set.intersect
etc. In this way we avoid the ambiguity that plagues Map.
Useful Operations on Tables
Some notes on terminology: Lua tables are usually list-like (like an array) or map-like (like an associative array or dict); they can of course have a list-like and a map-like part. Some of the table operations only make sense for list-like tables, and some only for map-like tables. (The usual Lua terminology is the array part and the hash part of the table, which reflects the actual implementation used; it is more accurate to say that a Lua table is an associative map which happens to be particularly efficient at acting like an array.)
The functions provided in table provide all the basic manipulations on Lua tables, but as we saw with the List class, it is useful to build higher-level operations on top of those functions. For instance, to copy a table involves this kind of loop:
local res = {} for k,v in pairs(T) do res[k] = v end return res
The tablex module provides this as copy, which does a shallow copy of a table. There is also deepcopy which goes further than a simple loop in two ways; first, it also gives the copy the same metatable as the original (so it can copy objects like List above) and any nested tables will also be copied, to arbitrary depth. There is also icopy which operates on list-like tables, where you can set optionally set the start index of the source and destination as well. It ensures that any left-over elements will be deleted:
asserteq(icopy({1,2,3,4,5,6},{20,30}),{20,30}) -- start at 1 asserteq(icopy({1,2,3,4,5,6},{20,30},2),{1,20,30}) -- start at 2 asserteq(icopy({1,2,3,4,5,6},{20,30},2,2),{1,30}) -- start at 2, copy from 2
(This code from the tablex test module shows the use of pl.test.asserteq)
Whereas, move overwrites but does not delete the rest of the destination:
asserteq(move({1,2,3,4,5,6},{20,30}),{20,30,3,4,5,6}) asserteq(move({1,2,3,4,5,6},{20,30},2),{1,20,30,4,5,6}) asserteq(move({1,2,3,4,5,6},{20,30},2,2),{1,30,3,4,5,6})
(The difference is somewhat like that between C's strcpy
and memmove
.)
To summarize, use copy or deepcopy to make a copy of an arbitrary table. To copy into a map-like table, use update; to copy into a list-like table use icopy, and move if you are updating a range in the destination.
To complete this set of operations, there is insertvalues which works like table.insert except that one provides a table of values to be inserted, and removevalues which removes a range of values.
asserteq(insertvalues({1,2,3,4},2,{20,30}),{1,20,30,2,3,4}) asserteq(insertvalues({1,2},{3,4}),{1,2,3,4})
Another example:
> T = require 'pl.tablex' > t = {10,20,30,40} > = T.removevalues(t,2,3) {10,40} > = T.insertvalues(t,2,{20,30}) {10,20,30,40}
In a similar spirit to deepcopy, deepcompare will take two tables and return true only if they have exactly the same values and structure.
> t1 = {1,{2,3},4} > t2 = deepcopy(t1) > = t1 == t2 false > = deepcompare(t1,t2) true
find will return the index of a given value in a list-like table. Note that like string.find you can specify an index to start searching, so that all instances can be found. There is an optional fourth argument, which makes the search start at the end and go backwards, so we could define rfind like so:
function rfind(t,val,istart) return tablex.find(t,val,istart,true) end
find does a linear search, so it can slow down code that depends on it. If efficiency is required for large tables, consider using an index map. index_map will return a table where the keys are the original values of the list, and the associated values are the indices. (It is almost exactly the representation needed for a set.)
> t = {'one','two','three'} > = tablex.find(t,'two') 2 > = tablex.find(t,'four') nil > il = tablex.index_map(t) > = il['two'] 2 > = il.two 2
A version of index_map called makeset is also provided, where the values are
just true
. This is useful because two such sets can be compared for equality
using deepcompare:
> = deepcompare(makeset {1,2,3},makeset {2,1,3}) true
Consider the problem of determining the new employees that have joined in a period. Assume we have two files of employee names:
(last-month.txt)
smith,john
brady,maureen
mongale,thabo
(this-month.txt)
smith,john
smit,johan
brady,maureen
mogale,thabo
van der Merwe,Piet
To find out differences, just make the employee lists into sets, like so:
require 'pl' function read_employees(file) local ls = List(io.lines(file)) -- a list of employees return tablex.makeset(ls) end last = read_employees 'last-month.txt' this = read_employees 'this-month.txt' -- who is in this but not in last? diff = tablex.difference(this,last) -- in a set, the keys are the values... for e in pairs(diff) do print(e) end -- *output* -- van der Merwe,Piet -- smit,johan
The difference operation is easy to write and read:
for e in pairs(this) do if not last[e] then print(e) end end
Using difference here is not that it is a tricky thing to code, it is that you are stating your intentions clearly to other readers of your code. (And naturally to your future self, in six months time.)
find_if will search a table using a function. The optional third argument is a value which will be passed as a second argument to the function. pl.operator provides the Lua operators conveniently wrapped as functions, so the basic comparison functions are available:
> ops = require 'pl.operator' > = tablex.find_if({10,20,30,40},ops.gt,20) 3 true
Note that find_if will also return the actual value returned by the function,
which of course is usually just true
for a boolean function, but any value
which is not nil
and not false
can be usefully passed back.
deepcompare does a thorough recursive comparison, but otherwise using the default equality operator. compare allows you to specify exactly what function to use when comparing two list-like tables, and compare_no_order is true if they contain exactly the same elements. Do note that the latter does not need an explicit comparison function - in this case the implementation is actually to compare the two sets, as above:
> = compare_no_order({1,2,3},{2,1,3}) true > = compare_no_order({1,2,3},{2,1,3},'==') true
(Note the special string '==' above; instead of saying ops.gt
or ops.eq
we
can use the strings '>' or '==' respectively.)
sort and sortv return iterators that will iterate through the sorted elements of a table. sort iterates by sorted key order, and sortv iterates by sorted value order. For example, given a table with names and ages, it is trivial to iterate over the elements:
> t = {john=27,jane=31,mary=24} > for name,age in tablex.sort(t) do print(name,age) end jane 31 john 27 mary 24 > for name,age in tablex.sortv(t) do print(name,age) end mary 24 john 27 jane 31
There are several ways to merge tables in PL. If they are list-like, then see the operations defined by pl.List, like concatenation. If they are map-like, then merge provides two basic operations. If the third arg is false, then the result only contains the keys that are in common between the two tables, and if true, then the result contains all the keys of both tables. These are in fact generalized set union and intersection operations:
> S1 = {john=27,jane=31,mary=24} > S2 = {jane=31,jones=50} > = tablex.merge(S1, S2, false) {jane=31} > = tablex.merge(S1, S2, true) {mary=24,jane=31,john=27,jones=50}
When working with tables, you will often find yourself writing loops like in the first example. Loops are second nature to programmers, but they are often not the most elegant and self-describing way of expressing an operation. Consider the map function, which creates a new table by applying a function to each element of the original:
> = map(math.sin, {1,2,3,4}) { 0.84, 0.91, 0.14, -0.76} > = map(function(x) return x*x end, {1,2,3,4}) {1,4,9,16}
map saves you from writing a loop, and the resulting code is often clearer, as well as being shorter. This is not to say that 'loops are bad' (although you will hear that from some extremists), just that it's good to capture standard patterns. Then the loops you do write will stand out and acquire more significance.
pairmap is interesting, because the function works with both the key and the value.
> t = {fred=10,bonzo=20,alice=4} > = pairmap(function(k,v) return v end, t) {4,10,20} > = pairmap(function(k,v) return k end, t) {'alice','fred','bonzo'}
(These are common enough operations that the first is defined as values and the second as keys.) If the function returns two values, then the second value is considered to be the new key:
> = pairmap(t,function(k,v) return v+10, k:upper() end) {BONZO=30,FRED=20,ALICE=14}
map2 applies a function to two tables:
> map2(ops.add,{1,2},{10,20}) {11,22} > map2('*',{1,2},{10,20}) {10,40}
The various map operations generate tables; reduce applies a function of two arguments over a table and returns the result as a scalar:
> reduce ('+', {1,2,3}) 6 > reduce ('..', {'one','two','three'}) 'onetwothree'
Finally, zip sews different tables together:
> = zip({1,2,3},{10,20,30}) {{1,10},{2,20},{3,30}}
Browsing through the documentation, you will find that tablex and List share
methods. For instance, tablex.imap and List.map are basically the same
function; they both operate over the array-part of the table and generate another
table. This can also be expressed as a list comprehension C 'f(x) for x' (t)
which makes the operation more explicit. So why are there different ways to do
the same thing? The main reason is that not all tables are Lists: the expression
ls:map('#')
will return a list of the lengths of any elements of ls
. A list
is a thin wrapper around a table, provided by the metatable List. Sometimes you
may wish to work with ordinary Lua tables; the List interface is not a
compulsory way to use Penlight table operations.
Operations on two-dimensional tables
Two-dimensional tables are of course easy to represent in Lua, for instance
{{1,2},{3,4}}
where we store rows as subtables and index like so A[col][row]
.
This is the common representation used by matrix libraries like
LuaMatrix. pl.array2d does not provide
matrix operations, since that is the job for a specialized library, but rather
provides generalizations of the higher-level operations provided by pl.tablex
for one-dimensional arrays.
iter is a useful generalization of ipairs. (The extra parameter determines whether you want the indices as well.)
> a = {{1,2},{3,4}} > for i,j,v in array2d.iter(a,true) do print(i,j,v) end 1 1 1 1 2 2 2 1 3 2 2 4
Note that you can always convert an arbitrary 2D array into a 'list of lists'
with List(tablex.map(List,a))
map will apply a function over all elements (notice that extra arguments can be
provided, so this operation is in effect function(x) return x-1 end
)
> array2d.map('-',a,1) {{0,1},{2,3}}
2D arrays are stored as an array of rows, but columns can be extracted:
> array2d.column(a,1) {1,3}
There are three equivalents to tablex.reduce. You can either reduce along the rows (which is the most efficient) or reduce along the columns. Either one will give you a 1D array. And reduce2 will apply two operations: the first one reduces the rows, and the second reduces the result.
> array2d.reduce_rows('+',a) {3,7} > array2d.reduce_cols('+',a) {4,6} > -- same as tablex.reduce('*',array.reduce_rows('+',a)) > array2d.reduce2('*','+',a) 21 `
tablex.map2 applies an operation to two tables, giving another table. array2d.map2 does this for 2D arrays. Note that you have to provide the rank of the arrays involved, since it's hard to always correctly deduce this from the data:
> b = {{10,20},{30,40}} > a = {{1,2},{3,4}} > = array2d.map2('+',2,2,a,b) -- two 2D arrays {{11,22},{33,44}} > = array2d.map2('+',1,2,{10,100},a) -- 1D, 2D {{11,102},{13,104}} > = array2d.map2('*',2,1,a,{1,-1}) -- 2D, 1D {{1,-2},{3,-4}}
Of course, you are not limited to simple arithmetic. Say we have a 2D array of strings, and wish to print it out with proper right justification. The first step is to create all the string lengths by mapping string.len over the array, the second is to reduce this along the columns using math.max to get maximum column widths, and last, apply stringx.rjust with these widths.
maxlens = reduce_cols(math.max,map('#',lines)) lines = map2(stringx.rjust,2,1,lines,maxlens)
There is product which returns the Cartesian product of two 1D arrays. The result is a 2D array formed from applying the function to all possible pairs from the two arrays.
> array2d.product('{}',{1,2},{'a','b'}) {{{1,'b'},{2,'a'}},{{1,'a'},{2,'b'}}}
There is a set of operations which work in-place on 2D arrays. You can
swap_rows and swap_cols; the first really is a simple one-liner, but the idea
here is to give the operation a name. remove_row and remove_col are
generalizations of table.remove. Likewise, extract_rows and extract_cols
are given arrays of indices and discard anything else. So, for instance,
extract_cols(A,{2,4})
will leave just columns 2 and 4 in the array.
List.slice is often useful on 1D arrays; slice does the same thing, but is generally given a start (row,column) and a end (row,column).
> A = {{1,2,3},{4,5,6},{7,8,9}} > B = slice(A,1,1,2,2) > write(B) 1 2 4 5 > B = slice(A,2,2) > write(B,nil,'%4.1f') 5.0 6.0 8.0 9.0
Here write is used to print out an array nicely; the second parameter is nil
,
which is the default (stdout) but can be any file object and the third parameter
is an optional format (as used in string.format).
parse_range will take a spreadsheet range like 'A1:B2' or 'R1C1:R2C2' and return the range as four numbers, which can be passed to slice. The rule is that slice will return an array of the appropriate shape depending on the range; if a range represents a row or a column, the result is 1D, otherwise 2D.
This applies to iter as well, which can also optionally be given a range:
> for i,j,v in iter(A,true,2,2) do print(i,j,v) end 2 2 5 2 3 6 3 2 8 3 3 9
new will construct a new 2D array with the given dimensions. You provide an
initial value for the elements, which is interpreted as a function if it's
callable. With L
being utils.string_lambda we then have the following way to
make an identity matrix:
asserteq( array.new(3,3,L'|i,j| i==j and 1 or 0'), {{1,0,0},{0,1,0},{0,0,1}} )
Please note that most functions in array2d are covariant, that is, they
return an array of the same type as they receive. In particular, any objects
created with data.new or matrix.new
will remain data or matrix objects when
reshaped or sliced, etc. Data objects have the array2d functions available as
methods.