Module pl.lexer

Lexical scanner for creating a sequence of tokens from text.

lexer.scan(s) returns an iterator over all tokens found in the string s. This iterator returns two values, a token type string (such as 'string' for quoted string, 'iden' for identifier) and the value of the token.

Versions specialized for Lua and C are available; these also handle block comments and classify keywords as 'keyword' tokens. For example:

> s = 'for i=1,n do'
> for t,v in lexer.lua(s)  do print(t,v) end
keyword for
iden    i
=       =
number  1
,       ,
iden    n
keyword do

See the Guide for further discussion

Functions

scan (s, matches[, filter[, options]]) create a plain token iterator from a string or file-like object.
insert (tok, a1, a2) insert tokens into a stream.
getline (tok) get everything in a stream upto a newline.
lineno (tok) get current line number.
getrest (tok) get the rest of the stream.
get_keywords () get the Lua keywords as a set-like table.
lua (s[, filter[, options]]) create a Lua token iterator from a string or file-like object.
cpp (s[, filter[, options]]) create a C/C++ token iterator from a string or file-like object.
get_separated_list (tok[, endtoken=')'[, delim=']]) get a list of parameters separated by a delimiter from a stream.
skipws (tok) get the next non-space token from the stream.
expecting (tok, expected_type, no_skip_ws) get the next token, which must be of the expected type.


Functions

scan (s, matches[, filter[, options]])
create a plain token iterator from a string or file-like object.

Parameters:

  • s string or file a string or a file-like object with :read() method returning lines.
  • matches tab an optional match table - array of token descriptions. A token is described by a {pattern, action} pair, where pattern should match token body and action is a function called when a token of described type is found.
  • filter tab a table of token types to exclude, by default {space=true} (optional)
  • options tab a table of options; by default, {number=true,string=true}, which means convert numbers and strip string quotes. (optional)
insert (tok, a1, a2)
insert tokens into a stream.

Parameters:

  • tok a token stream
  • a1 a string is the type, a table is a token list and a function is assumed to be a token-like iterator (returns type & value)
  • a2 string a string is the value
getline (tok)
get everything in a stream upto a newline.

Parameters:

  • tok a token stream

Returns:

    a string
lineno (tok)
get current line number.

Parameters:

  • tok a token stream

Returns:

    the line number. if the input source is a file-like object, also return the column.
getrest (tok)
get the rest of the stream.

Parameters:

  • tok a token stream

Returns:

    a string
get_keywords ()
get the Lua keywords as a set-like table. So res["and"] etc would be true.

Returns:

    a table
lua (s[, filter[, options]])
create a Lua token iterator from a string or file-like object. Will return the token type and value.

Parameters:

  • s string the string
  • filter tab a table of token types to exclude, by default {space=true,comments=true} (optional)
  • options tab a table of options; by default, {number=true,string=true}, which means convert numbers and strip string quotes. (optional)
cpp (s[, filter[, options]])
create a C/C++ token iterator from a string or file-like object. Will return the token type type and value.

Parameters:

  • s string the string
  • filter tab a table of token types to exclude, by default {space=true,comments=true} (optional)
  • options tab a table of options; by default, {number=true,string=true}, which means convert numbers and strip string quotes. (optional)
get_separated_list (tok[, endtoken=')'[, delim=']])
get a list of parameters separated by a delimiter from a stream.

Parameters:

  • tok the token stream
  • endtoken string end of list. Can be '\n' (default ')')
  • delim string separator (default ')

Returns:

    a list of token lists.
skipws (tok)
get the next non-space token from the stream.

Parameters:

  • tok the token stream.
expecting (tok, expected_type, no_skip_ws)
get the next token, which must be of the expected type. Throws an error if this type does not match!

Parameters:

  • tok the token stream
  • expected_type string the token type
  • no_skip_ws bool whether we should skip whitespace
generated by LDoc 1.4.6