Functional Programming
Sequences
A Lua iterator (in its simplest form) is a function which can be repeatedly
called to return a set of one or more values. The for in
statement understands
these iterators, and loops until the function returns nil
. There are standard
sequence adapters for tables in Lua (ipairs and pairs), and io.lines
returns an iterator over all the lines in a file. In the Penlight libraries, such
iterators are also called sequences. A sequence of single values (say from
io.lines) is called single-valued, whereas the sequence defined by pairs is
double-valued.
pl.seq provides a number of useful iterators, and some functions which operate on sequences. At first sight this example looks like an attempt to write Python in Lua, (with the sequence being inclusive):
> for i in seq.range(1,4) do print(i) end 1 2 3 4
But range is actually equivalent to Python's xrange
, since it generates a
sequence, not a list. To get a list, use seq.copy(seq.range(1,10))
, which
takes any single-value sequence and makes a table from the result. seq.list is
like ipairs except that it does not give you the index, just the value.
> for x in seq.list {1,2,3} do print(x) end 1 2 3
enum takes a sequence and turns it into a double-valued sequence consisting of
a sequence number and the value, so enum(list(ls))
is actually equivalent to
ipairs. A more interesting example prints out a file with line numbers:
for i,v in seq.enum(io.lines(fname)) do print(i..' '..v) end
Sequences can be combined, either by 'zipping' them or by concatenating them.
> for x,y in seq.zip(l1,l2) do print(x,y) end 10 1 20 2 30 3 > for x in seq.splice(l1,l2) do print(x) end 10 20 30 1 2 3
seq.printall is useful for printing out single-valued sequences, and provides some finer control over formatting, such as a delimiter, the number of fields per line, and a format string to use (@see string.format)
> seq.printall(seq.random(10)) 0.0012512588885159 0.56358531449324 0.19330423902097 .... > seq.printall(seq.random(10), ',', 4, '%4.2f') 0.17,0.86,0.71,0.51 0.30,0.01,0.09,0.36 0.15,0.17,
map will apply a function to a sequence.
> seq.printall(seq.map(string.upper, {'one','two'})) ONE TWO > seq.printall(seq.map('+', {10,20,30}, 1)) 11 21 31
filter will filter a sequence using a boolean function (often called a predicate). For instance, this code only prints lines in a file which are composed of digits:
for l in seq.filter(io.lines(file), stringx.isdigit) do print(l) end
The following returns a table consisting of all the positive values in the
original table (equivalent to tablex.filter(ls, '>', 0)
)
ls = seq.copy(seq.filter(ls, '>', 0))
We're already encountered seq.sum when discussing input.numbers. This can also be expressed with seq.reduce:
> seq.reduce(function(x,y) return x + y end, seq.list{1,2,3,4}) 10
seq.reduce applies a binary function in a recursive fashion, so that:
reduce(op,{1,2,3}) => op(1,reduce(op,{2,3}) => op(1,op(2,3))
it's now possible to easily generate other cumulative operations; the standard operations declared in pl.operator are useful here:
> ops = require 'pl.operator' > -- can also say '*' instead of ops.mul > = seq.reduce(ops.mul,input.numbers '1 2 3 4') 24
There are functions to extract statistics from a sequence of numbers:
> l1 = List {10,20,30} > l2 = List {1,2,3} > = seq.minmax(l1) 10 30 > = seq.sum(l1) 60 3
It is common to get sequences where values are repeated, say the words in a file. count_map will take such a sequence and count the values, returning a table where the keys are the unique values, and the value associated with each key is the number of times they occurred:
> t = seq.count_map {'one','fred','two','one','two','two'} > = t {one=2,fred=1,two=3}
This will also work on numerical sequences, but you cannot expect the result to be a proper list, i.e. having no 'holes'. Instead, you always need to use pairs to iterate over the result - note that there is a hole at index 5:
> t = seq.count_map {1,2,4,2,2,3,4,2,6} > for k,v in pairs(t) do print(k,v) end 1 1 2 4 3 1 4 2 6 1
unique
uses count_map to return a list of the unique values, that is, just
the keys of the resulting table.
last turns a single-valued sequence into a double-valued sequence with the current value and the last value:
> for current,last in seq.last {10,20,30,40} do print (current,last) end 20 10 30 20 40 30
This makes it easy to do things like identify repeated lines in a file, or construct differences between values. filter can handle double-valued sequences as well, so one could filter such a sequence to only return cases where the current value is less than the last value by using operator.lt or just '<'. This code then copies the resulting code into a table.
> ls = {10,9,10,3} > = seq.copy(seq.filter(seq.last(s),'<')) {9,3}
Sequence Wrappers
The functions in pl.seq cover the common patterns when dealing with sequences,
but chaining these functions together can lead to ugly code. Consider the last
example of the previous section; seq is repeated three times and the resulting
expression has to be read right-to-left. The first issue can be helped by local
aliases, so that the expression becomes copy(filter(last(s),'<'))
but the
second issue refers to the somewhat unnatural order of functional application.
We tend to prefer reading operations from left to right, which is one reason why
object-oriented notation has become popular. Sequence adapters allow this
expression to be written like so:
seq(s):last():filter('<'):copy()
With this notation, the operation becomes a chain of method calls running from left to right.
'Sequence' is not a basic Lua type, they are generally functions or callable
objects. The expression seq(s)
wraps a sequence in a sequence wrapper, which
is an object which understands all the functions in pl.seq as methods. This
object then explicitly represents sequences.
As a special case, the constructor (which is when you call the table seq) will make a wrapper for a plain list-like table. Here we apply the length operator to a sequence of strings, and print them out.
> seq{'one','tw','t'} :map '#' :printall() 3 2 1
As a convenience, there is a function seq.lines which behaves just like io.lines except it wraps the result as an explicit sequence type. This takes the first 10 lines from standard input, makes it uppercase, turns it into a sequence with a count and the value, glues these together with the concatenation operator, and finally prints out the sequence delimited by a newline.
seq.lines():take(10):upper():enum():map('..'):printall '\n'
Note the method upper
, which is not a seq function. if an unknown method is
called, sequence wrappers apply that method to all the values in the sequence
(this is implicit use of mapmethod)
It is straightforward to create custom sequences that can be used in this way. On
Unix, /dev/random
gives you an endless sequence of random bytes, so we use
take to limit the sequence, and then map to scale the result into the desired
range. The key step is to use seq to wrap the iterator function:
-- random.lua local seq = require 'pl.seq' function dev_random() local f = io.open('/dev/random') local byte = string.byte return seq(function() -- read two bytes into a string and convert into a 16-bit number local s = f:read(2) return byte(s,1) + 256*byte(s,2) end) end -- print 10 random numbers from 0 to 1 ! dev_random():take(10):map('%',100):map('/',100):printall ','
Another Linux one-liner depends on the /proc
filesystem and makes a list of all
the currently running processes:
pids = seq(lfs.dir '/proc'):filter(stringx.isdigit):map(tonumber):copy()
This version of Penlight has an experimental feature which relies on the fact that all Lua types can have metatables, including functions. This makes implicit sequence wrapping possible:
> seq.import() > seq.random(5):printall(',',5,'%4.1f') 0.0, 0.1, 0.4, 0.1, 0.2
This avoids the awkward seq(seq.random(5))
construction. Or the iterator can
come from somewhere else completely:
> ('one two three'):gfind('%a+'):printall(',') one,two,three,
After seq.import
, it is no longer necessary to explicitly wrap sequence
functions.
But there is a price to pay for this convenience. Every function is affected, so that any function can be used, appropriate or not:
> math.sin:printall() ..seq.lua:287: bad argument #1 to '(for generator)' (number expected, got nil) > a = tostring > = a:find(' ') function: 0042C920
What function is returned? It's almost certain to be something that makes no sense in the current context. So implicit sequences may make certain kinds of programming mistakes harder to catch - they are best used for interactive exploration and small scripts.
List Comprehensions
List comprehensions are a compact way to create tables by specifying their elements. In Python, you can say this:
ls = [x for x in range(5)] # == [0,1,2,3,4]
In Lua, using pl.comprehension:
> C = require('pl.comprehension').new() > = C ('x for x=1,10') () {1,2,3,4,5,6,7,8,9,10}
C
is a function which compiles a list comprehension string into a function.
In this case, the function has no arguments. The parentheses are redundant for a
function taking a string argument, so this works as well:
> = C 'x^2 for x=1,4' () {1,4,9,16} > = C '{x,x^2} for x=1,4' () {{1,1},{2,4},{3,9},{4,16}}
Note that the expression can be any function of the variable x
!
The basic syntax so far is <expr> for <set>
, where <set>
can be anything that
the Lua for
statement understands. <set>
can also just be the variable, in
which case the values will come from the argument of the comprehension. Here
I'm emphasizing that a comprehension is a function which can take a list argument:
> = C '2*x for x' {1,2,3} {2,4,6} > dbl = C '2*x for x' > = dbl {10,20,30} {20,40,60}
Here is a somewhat more explicit way of saying the same thing; _1
is a
placeholder referring to the first argument passed to the comprehension.
> = C '2*x for _,x in pairs(_1)' {10,20,30} {20,40,60} > = C '_1(x) for x'(tostring,{1,2,3,4}) {'1','2','3','4'}
This extended syntax is useful when you wish to collect the result of some iterator, such as io.lines. This comprehension creates a function which creates a table of all the lines in a file:
> f = io.open('array.lua') > lines = C 'line for line in _1:lines()' (f) > = #lines 118
There are a number of functions that may be applied to the result of a comprehension:
> = C 'min(x for x)' {1,44,0} 0 > = C 'max(x for x)' {1,44,0} 44 > = C 'sum(x for x)' {1,44,0} 45
(These are equivalent to a reduce operation on a list.)
After the for
part, there may be a condition, which filters the output. This
comprehension collects the even numbers from a list:
> = C 'x for x if x % 2 == 0' {1,2,3,4,5} {2,4}
There may be a number of for
parts:
> = C '{x,y} for x = 1,2 for y = 1,2' () {{1,1},{1,2},{2,1},{2,2}} > = C '{x,y} for x for y' ({1,2},{10,20}) {{1,10},{1,20},{2,10},{2,20}}
These comprehensions are useful when dealing with functions of more than one variable, and are not so easily achieved with the other Penlight functional forms.
Creating Functions from Functions
Lua functions may be treated like any other value, although of course you cannot
multiply or add them. One operation that makes sense is function composition,
which chains function calls (so (f * g)(x)
is f(g(x))
.)
> func = require 'pl.func' > printf = func.compose(io.write,string.format) > printf("hello %s\n",'world') hello world true
Many functions require you to pass a function as an argument, say to apply to all values of a sequence or as a callback. Often useful functions have the wrong number of arguments. So there is a need to construct a function of one argument from one of two arguments, binding the extra argument to a given value.
partial application takes a function of n arguments and returns a function of n-1 arguments where the first argument is bound to some value:
> p2 = func.bind1(print,'start>') > p2('hello',2) start> hello 2 > ops = require 'pl.operator' > = tablex.filter({1,-2,10,-1,2},bind1(ops.gt,0)) {-2,-1} > tablex.filter({1,-2,10,-1,2},bind1(ops.le,0)) {1,10,2}
The last example unfortunately reads backwards, because bind1 always binds the
first argument! Also unfortunately, in my youth I confused 'currying' with
'partial application', so the old name for bind1 is curry
- this alias still exists.
This is a specialized form of function argument binding. Here is another way to say the print example:
> p2 = func.bind(print,'start>',func._1,func._2) > p2('hello',2) start> hello 2
where _1
and _2
are placeholder variables, corresponding to the first and
second argument respectively.
Having func all over the place is distracting, so it's useful to pull all of pl.func into the local context. Here is the filter example, this time the right way around:
> utils.import 'pl.func' > tablex.filter({1,-2,10,-1,2},bind(ops.gt, _1, 0)) {1,10,2}
tablex.merge does a general merge of two tables. This example shows the usefulness of binding the last argument of a function.
> S1 = {john=27, jane=31, mary=24} > S2 = {jane=31, jones=50} > intersection = bind(tablex.merge, _1, _2, false) > union = bind(tablex.merge, _1, _2, true) > = intersection(S1,S2) {jane=31} > = union(S1,S2) {mary=24,jane=31,john=27,jones=50}
When using bind with print, we got a function of precisely two arguments,
whereas we really want our function to use varargs like print. This is the role
of _0
:
> _DEBUG = true > p = bind(print,'start>', _0) return function (fn,_v1) return function(...) return fn(_v1,...) end end > p(1,2,3,4,5) start> 1 2 3 4 5
I've turned on the global _DEBUG
flag, so that the function generated is
printed out. It is actually a function which generates the required function;
the first call binds the value of _v1
to 'start>'.
Placeholder Expressions
A common pattern in Penlight is a function which applies another function to all elements in a table or a sequence, such as tablex.map or seq.filter. Lua does anonymous functions well, although they can be a bit tedious to type:
> = tablex.map(function(x) return x*x end, {1,2,3,4}) {1,4,9,16}
pl.func allows you to define placeholder expressions, which can cut down on
the typing required, and also make your intent clearer. First, we bring contents
of pl.func into our context, and then supply an expression using placeholder
variables, such as _1
,_2
,etc. (C++ programmers will recognize this from the
Boost libraries.)
> utils.import 'pl.func' > = tablex.map(_1*_1, {1,2,3,4}) {1,4,9,16}
Functions of up to 5 arguments can be generated.
> = tablex.map2(_1+_2,{1,2,3}, {10,20,30}) {11,22,33}
These expressions can use arbitrary functions, although they must first be registered with the functional library. func.register brings in a single function, and func.import brings in a whole table of functions, such as math.
> sin = register(math.sin) > = tablex.map(sin(_1), {1,2,3,4}) {0.8414709848079,0.90929742682568,0.14112000805987,-0.75680249530793} > import 'math' > = tablex.map(cos(2*_1),{1,2,3,4}) {-0.41614683654714,-0.65364362086361,0.96017028665037,-0.14550003380861}
A common operation is calling a method of a set of objects:
> = tablex.map(_1:sub(1,1), {'one','four','x'}) {'o','f','x'}
There are some restrictions on what operators can be used in PEs. For instance,
because the __len
metamethod cannot be overridden by plain Lua tables, we need
to define a special function to express `#_1':
> = tablex.map(Len(_1), {'one','four','x'}) {3,4,1}
Likewise for comparison operators, which cannot be overloaded for different types, and thus also have to be expressed as a special function:
> = tablex.filter(Gt(_1,0), {1,-1,2,4,-3}) {1,2,4}
It is useful to express the fact that a function returns multiple values. For instance, tablex.pairmap expects a function that will be called with the key and the value, and returns the new value and the key, in that order.
> = pairmap(Args(_2,_1:upper()),{fred=1,alice=2}) {ALICE=2,FRED=1}
PEs cannot contain nil
values, since PE function arguments are represented as
an array. Instead, a special value called Nil
is provided. So say
_1:f(Nil,1)
instead of _1:f(nil,1)
.
A placeholder expression cannot be automatically used as a Lua function. The
technical reason is that the call operator must be overloaded to construct
function calls like _1(1)
. If you want to force a PE to return a function, use
func.I.
> = tablex.map(_1(10),{I(2*_1),I(_1*_1),I(_1+2)}) {20,100,12}
Here we make a table of functions taking a single argument, and then call them all with a value of 10.
The essential idea with PEs is to 'quote' an expression so that it is not
immediately evaluated, but instead turned into a function that can be applied
later to some arguments. The basic mechanism is to wrap values and placeholders
so that the usual Lua operators have the effect of building up an expression
tree. (It turns out that you can do symbolic algebra using PEs, see
symbols.lua in the examples directory, and its test runner testsym.lua
, which
demonstrates symbolic differentiation.)
The rule is that if any operator has a PE operand, the result will be quoted.
Sometimes we need to quote things explicitly. For instance, say we want to pass a
function to a filter that must return true if the element value is in a set.
set[_1]
is the obvious expression, but it does not give the desired result,
since it evaluates directly, giving nil
. Indexing works differently than a
binary operation like addition (set+_1 is properly quoted) so there is a need
for an explicit quoting or wrapping operation. This is the job of the _
function; the PE in this case should be _(set)[_1]
. This works for functions
as well, as a convenient alternative to registering functions: _(math.sin)(_1)
.
This is equivalent to using the `lines' method:
for line in I(_(f):read()) do print(line) end
Now this will work for any 'file-like' object which which has a read
method
returning the next line. If you had a LuaSocket client which was being 'pushed'
by lines sent from a server, then _(s):receive '*l'
would create an iterator
for accepting input. These forms can be convenient for adapting your data flow so
that it can be passed to the sequence functions in `pl.seq'.
Placeholder expressions can be mixed with sequence wrapper expressions.
lexer.lua will give us a double-valued sequence of tokens, where the first
value is a type, and the second is a value. We filter out only the values where
the type is 'iden', extract the actual value using map
, get the unique values
and finally copy to a list.
> str = 'for i=1,10 do for j = 1,10 do print(i,j) end end' > = seq(lexer.lua(str)):filter('==','iden'):map(_2):unique():copy() {i,print,j}
This is a particularly intense line (and I don't always suggest making everything
a one-liner!); the key is the behaviour of map
, which will take both values of
the sequence, so _2
returns the value part. (Since filter
here takes extra
arguments, it only operates on the type values.)
There are some performance considerations to using placeholder expressions. Instantiating a PE requires constructing and compiling a function, which is not such a fast operation. So to get best performance, factor out PEs from loops like this;
local fn = I(_1:f() + _2:g()) for i = 1,n do res[i] = tablex.map2(fn,first[i],second[i]) end