Introduction

Purpose

It is often said of Lua that it does not include batteries. That is because the goal of Lua is to produce a lean expressive language that will be used on all sorts of machines, (some of which don't even have hierarchical filesystems). The Lua language is the equivalent of an operating system kernel; the creators of Lua do not see it as their responsibility to create a full software ecosystem around the language. That is the role of the community.

A principle of software design is to recognize common patterns and reuse them. If you find yourself writing things like `io.write(string.format('the answer is %d ',42))` more than a number of times then it becomes useful just to define a function printf. This is good, not just because repeated code is harder to maintain, but because such code is easier to read, once people understand your libraries.

Penlight captures many such code patterns, so that the intent of your code becomes clearer. For instance, a Lua idiom to copy a table is {unpack(t)}, but this will only work for 'small' tables (for a given value of 'small') so it is not very robust. Also, the intent is not clear. So tablex.deepcopy is provided, which will also copy nested tables and and associated metatables, so it can be used to clone complex objects.

The default error handling policy follows that of the Lua standard libraries: if a argument is the wrong type, then an error will be thrown, but otherwise we return nil,message if there is a problem. There are some exceptions; functions like input.fields default to shutting down the program immediately with a useful message. This is more appropriate behaviour for a script than providing a stack trace. (However, this default can be changed.) The lexer functions always throw errors, to simplify coding, and so should be wrapped in pcall.

If you are used to Python conventions, please note that all indices consistently start at 1.

The Lua function table.foreach has been deprecated in favour of the for in statement, but such an operation becomes particularly useful with the higher-order function support in Penlight. Note that tablex.foreach reverses the order, so that the function is passed the value and then the key. Although perverse, this matches the intended use better.

The only important external dependence of Penlight is LuaFileSystem (lfs), and if you want dir.copyfile to work cleanly on Windows, you will need either alien or be using LuaJIT as well. (The fallback is to call the equivalent shell commands.)

To Inject or not to Inject?

It was realized a long time ago that large programs needed a way to keep names distinct by putting them into tables (Lua), namespaces (C++) or modules (Python). It is obviously impossible to run a company where everyone is called 'Bruce', except in Monty Python skits. These 'namespace clashes' are more of a problem in a simple language like Lua than in C++, because C++ does more complicated lookup over 'injected namespaces'. However, in a small group of friends, 'Bruce' is usually unique, so in particular situations it's useful to drop the formality and not use last names. It depends entirely on what kind of program you are writing, whether it is a ten line script or a ten thousand line program.

So the Penlight library provides the formal way and the informal way, without imposing any preference. You can do it formally like:

local utils = require 'pl.utils'
utils.printf("%s\n","hello, world!")

or informally like:

require 'pl'
utils.printf("%s\n","That feels better")

require 'pl' makes all the separate Penlight modules available, without needing to require them each individually.

Generally, the formal way is better when writing modules, since then there are no global side-effects and the dependencies of your module are made explicit.

Andrew Starks has contributed another way, which balances nicely between the formal need to keep the global table uncluttered and the informal need for convenience. require'pl.import_into' returns a function, which accepts a table for injecting Penlight into, or if no table is given, it passes back a new one.

local pl = require'pl.import_into'()

The table pl is a 'lazy table' which loads modules as needed, so we can then use pl.utils.printf and so forth, without an explicit `require' or harming any globals.

If you are using _ENV with Lua 5.2 to define modules, then here is a way to make Penlight available within a module:

local _ENV,M = require 'pl.import_into' ()

function answer ()
    -- all the Penlight modules are available!
    return pretty.write(utils.split '10 20  30', '')
end

return M

The default is to put Penlight into \_ENV, which has the unintended effect of making it available from the module (much as module(...,package.seeall) does). To satisfy both convenience and safety, you may pass true to this function, and then the module M is not the same as \_ENV, but only contains the exported functions.

Otherwise, Penlight will not bring in functions into the global table, or clobber standard tables like 'io'. require('pl') will bring tables like 'utils','tablex',etc into the global table if they are used. This 'load-on-demand' strategy ensures that the whole kitchen sink is not loaded up front, so this method is as efficient as explicitly loading required modules.

You have an option to bring the pl.stringx methods into the standard string table. All strings have a metatable that allows for automatic lookup in string, so we can say s:upper(). Importing stringx allows for its functions to also be called as methods: s:strip(),etc:

require 'pl'
stringx.import()

or, more explicitly:

require('pl.stringx').import()

A more delicate operation is importing tables into the local environment. This is convenient when the context makes the meaning of a name very clear:

> require 'pl'
> utils.import(math)
> = sin(1.2)
0.93203908596723

utils.import can also be passed a module name as a string, which is first required and then imported. If used in a module, import will bring the symbols into the module context.

Keeping the global scope simple is very necessary with dynamic languages. Using global variables in a big program is always asking for trouble, especially since you do not have the spell-checking provided by a compiler. The pl.strict module enforces a simple rule: globals must be 'declared'. This means that they must be assigned before use; assigning to nil is sufficient.

> require 'pl.strict'
> print(x)
stdin:1: variable 'x' is not declared
> x = nil
> print(x)
nil

The strict module provided by Penlight is compatible with the 'load-on-demand' scheme used by require 'pl.

strict also disallows assignment to global variables, except in the main program. Generally, modules have no business messing with global scope; if you must do it, then use a call to rawset. Similarly, if you have to check for the existence of a global, use rawget.

If you wish to enforce strictness globally, then just add require 'pl.strict' at the end of pl/init.lua, otherwise call it from your main program.

As from 1.1.0, this module provides a strict.module function which creates (or modifies) modules so that accessing an unknown function or field causes an error.

For example,

-- mymod.lua
local strict = require 'pl.strict'
local M = strict.module (...)

function M.answer ()
    return 42
end

return M

If you were to accidentally type mymod.Answer(), then you would get a runtime error: "variable 'Answer' is not declared in 'mymod'".

This can be applied to existing modules. You may desire to have the same level of checking for the Lua standard libraries:

strict.make_all_strict(_G)

Thereafter a typo such as math.cosine will give you an explicit error, rather than merely returning a nil that will cause problems later.

What are function arguments in Penlight?

Many functions in Penlight themselves take function arguments, like map which applies a function to a list, element by element. You can use existing functions, like math.max, anonymous functions (like `function(x,y) return x > y end ), or operations by name (e.g '*' or '..'). The module pl.operator` exports all the standard Lua operations, like the Python module of the same name. Penlight allows these to be referred to by name, so operator.gt can be more concisely expressed as '>'.

Note that the map functions pass any extra arguments to the function, so we can have ls:filter('>',0), which is a shortcut for ls:filter(function(x) return x > 0 end).

Finally, pl.func supports placeholder expressions in the Boost lambda style, so that an anonymous function to multiply the two arguments can be expressed as \1*\2.

To use them directly, note that all function arguments in Penlight go through utils.function_arg. pl.func registers itself with this function, so that you can directly use placeholder expressions with standard methods:

> _1 = func._1
> = List{10,20,30}:map(_1+1)
{11,21,31}

Another option for short anonymous functions is provided by utils.string_lambda; this is invoked automatically:

> = List{10,20,30}:map '|x| x + 1'
{11,21,31}

Pros and Cons of Loopless Programming

The standard loops-and-ifs 'imperative' style of programming is dominant, and often seems to be the 'natural' way of telling a machine what to do. It is in fact very much how the machine does things, but we need to take a step back and find ways of expressing solutions in a higher-level way. For instance, applying a function to all elements of a list is a common operation:

local res = {}
for i = 1,#ls do
    res[i] = fun(ls[i])
end

This can be efficiently and succinctly expressed as ls:map(fun). Not only is there less typing but the intention of the code is clearer. If readers of your code spend too much time trying to guess your intention by analyzing your loops, then you have failed to express yourself clearly. Similarly, ls:filter('>',0) will give you all the values in a list greater than zero. (Of course, if you don't feel like using List, or have non-list-like tables, then pl.tablex offers the same facilities. In fact, the List methods are implemented using tablex functions.)

A common observation is that loopless programming is less efficient, particularly in the way it uses memory. ls1:map2('*',ls2):reduce '+' will give you the dot product of two lists, but an unnecessary temporary list is created. But efficiency is relative to the actual situation, it may turn out to be fast enough, or may not appear in any crucial inner loops, etc.

Writing loops is 'error-prone and tedious', as Stroustrup says. But any half-decent editor can be taught to do much of that typing for you. The question should actually be: is it tedious to read loops? As with natural language, programmers tend to read chunks at a time. A for-loop causes no surprise, and probably little brain activity. One argument for loopless programming is the loops that you do write stand out more, and signal 'something different happening here'. It should not be an all-or-nothing thing, since most programs require a mixture of idioms that suit the problem. Some languages (like APL) do nearly everything with map and reduce operations on arrays, and so solutions can sometimes seem forced. Wisdom is knowing when a particular idiom makes a particular problem easy to solve and the solution easy to explain afterwards.

Generally useful functions.

The function printf discussed earlier is included in pl.utils because it makes properly formatted output easier. (There is an equivalent fprintf which also takes a file object parameter, just like the C function.)

Splitting a string using a delimiter is a fairly common operation, hence split.

Utility functions like is_type help with identifying what kind of animal you are dealing with. The Lua type function handles the basic types, but can't distinguish between different kinds of objects, which are all tables. So is_type handles both cases, like is_type(s,"string") and is_type(ls,List).

A common pattern when working with Lua varargs is capturing all the arguments in a table:

function t(...)
    local args = {...}
    ...
end

But this will bite you someday when nil is one of the arguments, since this will put a 'hole' in your table. In particular, #ls will only give you the size upto the nil value. Hence the need for table.pack - this is a new Lua 5.2 function which Penlight defines also for Lua 5.1.

function t(...)
    local args,n = table.pack(...)
    for i = 1,n do
      ...
    end
end

The 'memoize' pattern occurs when you have a function which is expensive to call, but will always return the same value subsequently. utils.memoize is given a function, and returns another function. This calls the function the first time, saves the value for that argument, and thereafter for that argument returns the saved value. This is a more flexible alternative to building a table of values upfront, since in general you won't know what values are needed.

sum = utils.memoize(function(n)
    local sum = 0
    for i = 1,n do sum = sum + i end
    return sum
end)
...
s = sum(1e8) --takes time!
...
s = sum(1e8) --returned saved value!

Penlight is fully compatible with Lua 5.1, 5.2 and LuaJIT 2. To ensure this, utils also defines the global Lua 5.2 load function as utils.load

  • the input (either a string or a function)
  • the source name used in debug information
  • the mode is a string that can have either or both of 'b' or 't', depending on whether the source is a binary chunk or text code (default is 'bt')
  • the environment for the compiled chunk

Using utils.load should reduce the need to call the deprecated function setfenv, and make your Lua 5.1 code 5.2-friendly.

The utils module exports getfenv and setfenv for Lua 5.2 as well, based on code by Sergey Rozhenko. Note that these functions can fail for functions which don't access any globals.

Application Support

app.parse_args is a simple command-line argument parser. If called without any arguments, it tries to use the global arg array. It returns the flags (options beginning with '-') as a table of name/value pairs, and the arguments as an array. It knows about long GNU-style flag names, e.g. --value, and groups of short flags are understood, so that -ab is short for -a -b. The flags result would then look like {value=true,a=true,b=true}.

Flags may take values. The command-line --value=open -n10 would result in {value='open',n='10'}; generally you can use '=' or ':' to separate the flag from its value, except in the special case where a short flag is followed by an integer. Or you may specify upfront that some flags have associated values, and then the values will follow the flag.

> require 'pl'
> flags,args = app.parse_args({'-o','fred','-n10','fred.txt'},{o=true})
> pretty.dump(flags)
{o='fred',n='10'}

parse_args is not intelligent or psychic; it will not convert any flag values or arguments for you, or raise errors. For that, have a look at Lapp.

An application which consists of several files usually cannot use require to load files in the same directory as the main script. app.require_here() ensures that the Lua module path is modified so that files found locally are found first. In the examples directory, test-symbols.lua uses this function to ensure that it can find symbols.lua even if it is not run from this directory.

app.appfile will create a filename that your application can use to store its private data, based on the script name. For example, app.appfile "test.txt" from a script called testapp.lua produces the following file on my Windows machine:

C:\Documents and Settings\SJDonova\.testapp\test.txt

and the equivalent on my Linux machine:

/home/sdonovan/.testapp/test.txt

If .testapp does not exist, it will be created.

Penlight makes it convenient to save application data in Lua format. You can use pretty.dump(t,file) to write a Lua table in a human-readable form to a file, and pretty.read(file.read(file)) to generate the table again, using the pretty module.

Simplifying Object-Oriented Programming in Lua

Lua is similar to JavaScript in that the concept of class is not directly supported by the language. In fact, Lua has a very general mechanism for extending the behaviour of tables which makes it straightforward to implement classes. A table's behaviour is controlled by its metatable. If that metatable has a \\index function or table, this will handle looking up anything which is not found in the original table. A class is just a table with an __index key pointing to itself. Creating an object involves making a table and setting its metatable to the class; then when handling obj.fun, Lua first looks up fun in the table obj, and if not found it looks it up in the class. obj:fun(a) is just short for obj.fun(obj,a). So with the metatable mechanism and this bit of syntactic sugar, it is straightforward to implement classic object orientation.

-- animal.lua

class = require 'pl.class'

class.Animal()

function Animal:_init(name)
    self.name = name
end

function Animal:__tostring()
  return self.name..': '..self:speak()
end

class.Dog(Animal)

function Dog:speak()
  return 'bark'
end

class.Cat(Animal)

function Cat:_init(name,breed)
    self:super(name)  -- must init base!
    self.breed = breed
end

function Cat:speak()
  return 'meow'
end

class.Lion(Cat)

function Lion:speak()
  return 'roar'
end

fido = Dog('Fido')
felix = Cat('Felix','Tabby')
leo = Lion('Leo','African')

$ lua -i animal.lua
> = fido,felix,leo
Fido: bark      Felix: meow     Leo: roar
> = leo:is_a(Animal)
true
> = leo:is_a(Dog)
false
> = leo:is_a(Cat)
true

All Animal does is define \\tostring, which Lua will use whenever a string representation is needed of the object. In turn, this relies on speak, which is not defined. So it's what C++ people would call an abstract base class; the specific derived classes like Dog define speak. Please note that if derived classes have their own constructors, they must explicitly call the base constructor for their base class; this is conveniently available as the super method.

Note that (as always) there are multiple ways to implement OOP in Lua; this method uses the classic 'a class is the __index of its objects' but does 'fat inheritance'; methods from the base class are copied into the new class. The advantage of this is that you are not penalized for long inheritance chains, for the price of larger classes, but generally objects outnumber classes! (If not, something odd is going on with your design.)

All such objects will have a is_a method, which looks up the inheritance chain to find a match. Another form is class_of, which can be safely called on all objects, so instead of leo:is_a(Animal) one can say Animal:class_of(leo).

There are two ways to define a class, either class.Name() or Name = class(); both work identically, except that the first form will always put the class in the current environment (whether global or module); the second form provides more flexibility about where to store the class. The first form does name the class by setting the _name field, which can be useful in identifying the objects of this type later. This session illustrates the usefulness of having named classes, if no __tostring method is explicitly defined.

> class.Fred()
> a = Fred()
> = a
Fred: 00459330
> Alice = class()
> b = Alice()
> = b
table: 00459AE8
> Alice._name = 'Alice'
> = b
Alice: 00459AE8

So Alice = class(); Alice._name = 'Alice' is exactly the same as class.Alice().

This useful notation is borrowed from Hugo Etchegoyen's classlib which further extends this concept to allow for multiple inheritance. Notice that the more convenient form puts the class name in the current environment! That is, you may use it safely within modules using the old-fashioned module() or the new _ENV mechanism.

There is always more than one way of doing things in Lua; some may prefer this style for creating classes:

local class = require 'pl.class'

class.Named {
    _init = function(self,name)
        self.name = name
    end;

    __tostring = function(self)
        return 'boo '..self.name
    end;
}

b = Named 'dog'
print(b)
--> boo dog

Note that you have to explicitly declare self and end each function definition with a semi-colon or comma, since this is a Lua table. To inherit from a base class, set the special field _base to the class in this table.

Penlight provides a number of useful classes; there is List, which is a Lua clone of the standard Python list object, and Set which represents sets. There are three kinds of map defined: Map, MultiMap (where a key may have multiple values) and OrderedMap (where the order of insertion is remembered.). There is nothing special about these classes and you may inherit from them.

A powerful thing about dynamic languages is that you can redefine existing classes and functions, which is often called 'monkey patching' It's entertaining and convenient, but ultimately anti-social; you may modify List but then any other modules using this shared resource can no longer be sure about its behaviour. (This is why you must say stringx.import() explicitly if you want the extended string methods - it would be a bad default.) Lua is particularly open to modification but the community is not as tolerant of monkey-patching as the Ruby community, say. You may wish to add some new methods to List? Cool, but that's what subclassing is for.

class.Strings(List)

function Strings:my_method()
...
end

It's definitely more useful to define exactly how your objects behave in unknown conditions. All classes have a catch method you can use to set a handler for unknown lookups; the function you pass looks exactly like the __index metamethod.

Strings:catch(function(self,name)
    return function() error("no such method "..name,2) end
end)

In this case we're just customizing the error message, but creative things can be done. Consider this code from test-vector.lua:

Strings:catch(List.default_map_with(string))

ls = Strings{'one','two','three'}
asserteq(ls:upper(),{'ONE','TWO','THREE'})
asserteq(ls:sub(1,2),{'on','tw','th'})

So we've converted a unknown method invocation into a map using the function of that name found in string. So for a Vector (which is a specialization of List for numbers) it makes sense to make math the default map so that v:sin() makes sense.

Note that map operations return a object of the same type - this is often called covariance. So ls:upper() itself returns a Strings object.

This is not always what you want, but objects can always be cast to the desired type. (cast doesn't create a new object, but returns the object passed.)

local sizes = ls:map '#'
asserteq(sizes, {3,3,5})
asserteq(utils.type(sizes),'Strings')
asserteq(sizes:is_a(Strings),true)
sizes = Vector:cast(sizes)
asserteq(utils.type(sizes),'Vector')
asserteq(sizes+1,{4,4,6})

About utils.type: it can only return a string for a class type if that class does in fact have a _name field.

Properties are a useful object-oriented pattern. We wish to control access to a field, but don't wish to force the user of the class to say obj:get_field() etc. This excerpt from tests/test-class.lua shows how it is done:

local MyProps = class(class.properties)
local setted_a, got_b

function MyProps:_init ()
    self._a = 1
    self._b = 2
end

function MyProps:set_a (v)
    setted_a = true
    self._a = v
end

function MyProps:get_b ()
    got_b = true
    return self._b
end

local mp = MyProps()

mp.a = 10

asserteq(mp.a,10)
asserteq(mp.b,2)
asserteq(setted_a and got_b, true)

The convention is that the internal field name is prefixed with an underscore; when reading mp.a, first a check for an explicit getter get_a and then only look for _a. Simularly, writing mp.a causes the setter set_a to be used.

This is cool behaviour, but like much Lua metaprogramming, it is not free. Method lookup on such objects goes through \\index as before, but now \\index is a function which has to explicitly look up methods in the class, before doing any property indexing, which is not going to be as fast as field lookup. If however, your accessors actually do non-trivial things, then the extra overhead could be worth it.

This is not really intended for access control because external code can write to mp._a directly. It is possible to have this kind of control in Lua, but it again comes with run-time costs.

generated by LDoc 1.5.0