Mapping and Lua Iterators
Introduction to simple iterators in Lua
Let us begin by considering what a simple iterator is. In Lua, a simple iterator is a function. Each time you call the function, you get back the next value in the sequence. In the following example, f
is (assumed to be) a simple iterator:
local f = ("cat"):gmatch(".")
print(f()) --> c
print(f()) --> a
print(f()) --> t
[Technical note: In the reference implementation of Lua, string.gmatch
returns a simple iterator, but it is questionable whether or not this is mandated by the language. In particular, LuaJIT returns a full iterator rather than a simple iterator; see later]
As Lua functions can return multiple values, it seems logical to allow iterators to produce multiple values per call, and this is indeed allowed. In the following example, f
is (assumed to be) a simple iterator which produces two values per call:
local f = ("The Cute Cat"):gmatch("(%u)(%l*)")
print(f()) --> T, he
print(f()) --> C, ute
print(f()) --> C, at
To signal that it has finished producing values, an iterator returns nil
. Well, nearly; an iterator can signal completion by returning no values, by returning a single nil
, or by returning multiple values with the first value being nil
. With this in mind, the previous example can be rewritten to use a loop rather than an explicit number of calls to f
:
local f = ("The Cute Cat"):gmatch("(%u)(%l*)")
while true do
local first, rest = f()
if first == nil then
break
end
print(first, rest)
end
This is rather verbose syntax, and thankfully the generic for
loop can be used to achieve the same effect:
local f = ("The Cute Cat"):gmatch("(%u)(%l*)")
for first, rest in f do
print(first, rest)
end
In the previous examples, string.gmatch
is termed a simple iterator factory, as it returns a simple iterator (f
). Equipped with this knowledge of how simple iterators work, we can write a simple iterator factory which emulates the numeric for
loop:
function range(init, limit, step)
step = step or 1
return function()
local value = init
init = init + step
if limit * step >= value * step then
return value
end
end
end
This can then be used like so:
for n in range(3, 0, -1) do print(n) end --> 3; 2; 1; 0
Introduction to full iterators in Lua
Looking at range
in slightly more detail, we see that every time it is called, it returns a function, and that the function has three upvalues (init
, limit
, and step
). From an esoteric efficiency point of view, it would be nice to have less upvalues (or ideally no upvalues at all). Full iterators are one way of achieving this. Whereas a simple iterator is just a function (f
), a full iterator is a three-tuple (f,s,v
). As with a simple iterator, f
is still a function (or something with a __call
metamethod). As for the two new bits, s
is an opaque value which is passed as the first parameter to each call of f
, and v
is an opaque value which is passed as the second parameter to the first call of f
. Subsequent calls of f
pass the first return value from the previous call of f
as the the second parameter.
Written out formally like that, full iterators sound a little bit odd, but once you've wrapped your head around it, they are really quite reasonable. As an example, let us rewrite range
to be a full iterator factory rather than a simple iterator factory. For this, we'll take s
to be limit
, and v
to be init - step
, like so:
function range(init, limit, step)
step = step or 1
return function(lim, value)
value = value + step
if lim * step >= value * step then
return value
end
end, limit, init - step
end
As the generic for
loop is designed to work with full iterators, this version of range
can be used just like the previous version:
for n in range(3, 0, -1) do print(n) end --> 3; 2; 1; 0
Alternatively, to appreciate what is going on, we could use a verbose loop instead:
local f, s, v = range(3, 0, -1)
while true do
v = f(s, v)
if v == nil then
break
end
print(v)
end
Looking at this version of range
, the returned function only has one upvalue (step
). In real terms, this means that 32 less bytes of space are allocated on the heap as compared to the three upvalue version. Admittedly this isn't much, and it hardly justifies the language support for full iterators. Rather more convincing are the pairs
and ipairs
full iterator factories from the Lua standard library. The full iterator machinery is precisely what these two functions need in order for their iterator functions to have no upvalues at all, meaning that you can iterate through a table without having to allocate anything on the heap, as in the following example:
local t = {
Lemon = "sour",
Cake = "nice",
}
for ingredient, taste in pairs(t) do
print(ingredient .." is ".. taste)
end
--> Lemon is sour
--> Cake is nice
Mapping Lua iterators
Switching from Lua to Haskell for just a minute, recall the type signature of the map
function:
map :: (a -> b) -> [a] -> [b]
This takes a function as its first parameter, and applies this function to every element of a list in order to generate a new list. Due to Haskell's lazy evaluation behaviour, it isn't unreasonable to equate Haskell lists with iterators in other languages. With that in mind, a first attempt at a map
function for Lua iterators might be:
function map(mapf, f, ...)
return function(...)
return mapf(f(...))
end, ...
end
For all intents and purposes, this is just dressed up function composition. For some cases, like the following, it even works:
local f = map(string.upper, ("cat"):gmatch("."))
print(f()) --> C
print(f()) --> A
print(f()) --> T
Unfortunately, the following example doesn't work quite as well:
local t = {
Lemon = "sour",
Cake = "nice",
}
for ingredient, taste in map(function(a, b)
return a:lower(), b:upper()
end, pairs(t)) do
print(ingredient .." is ".. taste)
end
--> lemon is SOUR
--> invalid key to 'next'
If you've understood how full iterators work, then you should be able to appreciate why this example fails after the first iteration: the map function is changing the results from the iterator function, and the first of these changed results is being passed back to the iterator function, which confuses it. To solve this issue, we need a slightly more complicated construction:
function map(mapf, f, s, v)
local function domap(...)
v = ...
if v ~= nil then
return mapf(...)
end
end
return function()
return domap(f(s, v))
end
end
With this construction, we get the intended behaviour:
local t = {
Lemon = "sour",
Cake = "nice",
}
for ingredient, taste in map(function(a, b)
return a:lower(), b:upper()
end, pairs(t)) do
print(ingredient .." is ".. taste)
end
--> lemon is SOUR
--> cake is NICE
ConcatMapping Lua iterators
Jumping back to Haskell again, consider the slightly more complex concatMap
function:
concatMap :: (a -> [b]) -> [a] -> [b]
This allows the iterator function to produce zero or more results for each value of input, for example:
module Main where
import Data.Char
main = putStr $
concatMap (\c -> if isUpper c then [c] else [c,c]) "Cat" -- > "Caatt"
We could perform a direct conversion of this to Lua, and have the map function return an iterator, but unlike Haskell's syntactic support for constructing lists, Lua has none for constructing iterators, and so the result would feel rather clunky. The central issue is deciding how to return multiple values, or infact multiple tuples, from the map function. In Haskell it is natural to do this with lists, whereas in Lua we can achieve it with coroutines. Consider the following even more complex form of the map
function in Lua:
function map(mapf, f, s, v)
local done
local function maybeyield(...)
if ... ~= nil then
coroutine.yield(...)
end
end
local function domap(...)
v = ...
if v ~= nil then
return maybeyield(mapf(...))
else
done = true
end
end
return coroutine.wrap(function()
repeat
domap(f(s, v))
until done
end)
end
This version of map
still permits the previous example:
local t = {
Lemon = "sour",
Cake = "nice",
}
for ingredient, taste in map(function(a, b)
return a:lower(), b:upper()
end, pairs(t)) do
print(ingredient .." is ".. taste)
end
--> lemon is SOUR
--> cake is NICE
Furthermore, it supports the generation of multiple tuples for each input by calling coroutine.yield
for each additional tuple:
local t = {
Lemon = "sour",
Cake = "nice",
}
for ingredient, taste in map(function(a, b)
coroutine.yield(a:lower(), b:upper())
return a, "very ".. b
end, pairs(t)) do
print(ingredient .." is ".. taste)
end
--> lemon is SOUR
--> Lemon is very sour
--> cake is NICE
--> Cake is very nice
Alternatively, if the map function doesn't return, then it doesn't generate anything at all:
for n in map(function(n)
if n ~= 0 then
return 1 / n
end
end, range(-2, 2)) do
print(n)
end
--> -0.5; -1; 1; 0.5