Lil is the scripting language of Decker. Most of the time, scripts in Decker are very simple, like a script which changes cards when a user clicks on a button:
on click do go[thatCoolCard] end
For simple scripts, performance may not matter at all. Sometimes, though, performance makes the difference between accomplishing your vision and sighing in frustration while the cooling fans on your unresponsive computer attempt unsuccessfully to make it levitate. In this article we’ll work through several approaches to implementing a computationally-intensive program with Lil, and along the way learn a little bit about Vector Programming and how it can help us write scripts that are more concise and more efficient.
Feel free to follow along and try your own ideas in the Lil playground; it includes a version of the final iteration described here as the life
example program.
A cellular automaton simulates a world made up of cells. We begin with some arrangement of cells in space, and use their relationships to calculate a next generation of cells. By repeatedly applying simple rules, cellular automata can produce fascinatingly complex patterns which resemble (in some ways) patterns found in nature.
John Conway’s Game of Life is one cellular automaton which has been studied extensively, exhibiting patterns which can self-replicate, move through space, and even perform computation.
The Game of Life takes place on a two-dimensional (2D) grid composed of cells which are either “dead” (above: white) or “alive” (above: black). On each generation, each cell in the grid considers each of its 8 orthogonal and diagonal neighbors, counts the number of neighbors which are alive, and computes its state in the next generation according to the following three rules:
Let’s begin by translating the rules above into a Lil program with a straightforward, imperative approach. We will focus on writing a function named step
which is given a game board representing all the cells in our miniature universe and which is responsible for computing and returning the next generation’s board.
One way to represent a Life board in Lil is to use a dictionary, associating the (x,y)
coordinates of live cells with a value of 1
. If we try to look up a coordinate pair that isn’t set, we’ll get a 0
, which can represent dead cells. We’ll additionally define a global variable named SIZE
which contains the (width,height)
dimensions of the entire board.
As pseudocode:
next
generation.SIZE
, and for each cell:
board
.next
generation’s cell as alive or dead.Translated into Lil,
SIZE:20,20
board:()
...
on step board do
next:()
each y in range SIZE[1]
each x in range SIZE[0]
pos:x,y
around:0
each dx in -1,0,1
each dy in -1,0,1
if !(0,0)~(dx,dy)
around:around+board[pos+(dx,dy)]
end
end
end
if board[pos]
if around in 2,3 next[pos]:1 end
else
if around ~ 3 next[pos]:1 end
end
end
end
next
end
There are a few surface-level simplifications we could make to the above. For starters, we can tweak how we represent the rules to include the cell itself in our around
count, saving a conditional:
on step board do
next:()
each y in range SIZE[1]
each x in range SIZE[0]
pos:x,y
around:0
each dx in -1,0,1
each dy in -1,0,1
around:around+board[pos+(dx,dy)]
end
end
if board[pos]
if around in 3,4 next[pos]:1 end
else
if around ~ 3 next[pos]:1 end
end
end
end
next
end
Next, we’ll focus on our inner loop:
around:0
each dx in -1,0,1
each dy in -1,0,1
around:around+board[pos+(dx,dy)]
end
end
Instead of reassigning around
in each iteration of our inner loops, we can take advantage of the fact that each
loops naturally return a list of the results of their body expression and sum
those together:
around:sum each dx in -1,0,1
sum each dy in -1,0,1
board[pos+(dx,dy)]
end
end
Lil has a cross
operator which can be used to calculate the cartesian product of two lists or ranges:
3 cross 3 # ((0,0),(1,0),(2,0),(0,1),(1,1),(2,1),(0,2),(1,2),(2,2))
-1 + 3 cross 3 # ((-1,-1),(0,-1),(1,-1),(-1,0),(0,0),(1,0),(-1,1),(0,1),(1,1))
This is exactly what we’re expressing with our nested dx
/dy
loops, so with cross
we can collapse them together:
around:sum each delta in -1 + 3 cross 3
board[pos + delta]
end
A pair (pos) doesn’t conform with a list of pairs (the deltas), but it does if we flip
the list of deltas first. In this way we can add a position to an entire list of offsets at once:
deltas: -1 + 3 cross 3 # ((-1,-1),(0,-1),(1,-1),(-1,0),(0,0),(1,0),(-1,1),(0,1),(1,1))
(10,20) + deltas # ((9,9),(20,19))
(10,20) + flip deltas # ((9,10,11,9,10,11,9,10,11),(19,19,19,20,20,20,21,21,21))
flip (10,20) + flip deltas # ((9,19),(10,19),(11,19),(9,20),(10,20),(11,20),(9,21),(10,21),(11,21))
Taking advantage of this, we can “lift” the addition from the inner loop:
around:sum each x in flip pos + flip -1 + 3 cross 3
board[x]
end
Which in turn allows us to write the expression more concisely with the @
operator instead of an explicit loop:
around:sum board @ flip pos + flip -1 + 3 cross 3
The cross
operator can be used to simplify our outer loops as well:
on step board do
next:()
each pos in SIZE[0] cross SIZE[1]
around:sum board @ flip pos + flip -1 + 3 cross 3
if board[pos]
if around in 3,4 next[pos]:1 end
else
if around ~ 3 next[pos]:1 end
end
end
next
end
This is a reasonably concise and straightforward solution, but it is not particularly fast. Lil dictionaries, like all of Lil’s primitive datatypes, are immutable. Every time we write a 1
to some position in next
, the Lil interpreter constructs an entirely new dictionary with the same keys and elements plus our added key-element pair, and stores the result in next
. On a large board with many living cells, this overhead can add up to a huge amount of work! Lil’s copy-on-write semantics mean dictionary insertions are (in general) linear with respect to the size of the dictionary, rather than constant. It’s best to construct large dictionaries all at once, rather than a tiny piece at a time.
In addition to primitive types, like lists and dictionaries, Lil is furnished with a number of interfaces, which expose system resources and provide mutable data structures. The Image interface is a perfect fit for our game board: it represents a 2D array of bytes.
We don’t have to change much to use an Image instead of a dictionary; Images can also be indexed with coordinate pairs. The SIZE
constant is no longer needed, since the Image interface itself keeps track of its dimensions:
on step board do
next:image[board.size]
each pos in board.size[0] cross board.size[1]
around:sum board @ flip pos + flip -1 + 3 cross 3
if board[pos]
if around in 3,4 next[pos]:1 end
else
if around ~ 3 next[pos]:1 end
end
end
next
end
We avoided explicitly storing 0
values for dead cells in the dictionary to keep it “sparse” and minimize the number of entries required. An Image is always “dense”, so this is no longer a concern. As a result, we can simplify our conditionals a bit, taking advantage of the fact that if ... else ... end
returns the value of the taken branch:
on step board do
next:image[board.size]
each pos in board.size[0] cross board.size[1]
around:sum board @ flip pos + flip -1 + 3 cross 3
next[pos]:if board[pos]
around in 3,4
else
around~3
end
end
next
end
In fact, we could do without conditionals entirely by reworking the above into a single logical expression:
on step board do
next:image[board.size]
each pos in board.size[0] cross board.size[1]
around:sum board @ flip pos + flip -1 + 3 cross 3
next[pos]:(board[pos] & around in 3,4) | around~3
end
next
end
And we could hoist the assignment of every cell out of the loop by using image.pixels
to perform a bulk write. With writes occurring only when all reads have finished, we could even redesign the function to modify board
in-place without need for a next
buffer:
on step board do
board.pixels:each pos in board.size[0] cross board.size[1]
around:sum board @ flip pos + flip -1 + 3 cross 3
(board[pos] & around in 3,4) | around~3
end
end
This Image-based solution is significantly shorter, and a good bit faster. Instead of rebuilding the board dictionary for every write, the Lil interpreter only needs to allocate a fresh Image once at the beginning of the program. In Decker, the Image-based board representation has the additional benefit that we can easily display an Image graphically in the Listener or by pasting it to a Canvas Widget, or even the Card background Image.
Lil has many features inherited from Vector-oriented programming languages (sometimes also called “Array Languages”). The essence of vector programming is learning to think about manipulating entire data structures at once, in parallel, rather than serially.
For example, if we wanted to multiply a list by a scalar number in Lil we could write a loop which processes list elements serially:
each v in mylist
3*v
end
But the vector-oriented approach is to take advantage of the parallelism built into the *
operator:
3*mylist
The Image interface includes a number of functions for doing bulk operations on its pixels which we can take advantage of:
The image.map[]
function takes a dictionary as input, and updates every pixel in the Image by looking it up through the dictionary. For convenience, image.map[]
also accepts an optional second argument giving a “default” value for pixels not found in the keys of the dictionary. The main purpose of image.map[]
is swapping colors in graphics, but it can also be used to express any pure unary function of pixels. For example, given an Image x
containing pixels with patterns 0
or 1
, x.map[(0,1) dict (1,0)]
applies a logical NOT
to the Image, and x.map[(3,5) dict (1,1) 0]
is loosely equivalent to x in 3,5
.
The image.translate[]
function shifts an Image’s pixels by a specified offset. For example, x.translate[-1,2]
shifts every pixel of an Image x
one pixel left and two pixels down. By default, pixels “shifted in” are always pattern 0
and pixels “shifted out” are destroyed, but image.translate[]
also accepts an optional second argument which can indicate shifted pixels should “wrap” around opposite edges.
The image.merge[]
function is for compositing Images together. One way of merging Images is to specify a binary operator ("+"
, "-"
, "|"
(maximum), "&"
(minimum), ">"
, "<"
, or "="
) and a source Image. The pixels of both the source and target Image are paired up, the operator is applied to each pair, and the result is written back to the target Image.
Since all of these operations modify the target Image in-place, we may need to use image.copy[]
before applying them, to avoid destroying a dataset we need again later.
The inner loop of our current solution probes the board for the neighbors of each individual pixel by adding an offset to the coordinates of the central pixel. If we instead imagine using image.translate[]
, we could apply each of those offsets to the entire Image at once, and then use image.merge["+" ...]
to sum all of the live cells in the translated Images together.
We were already able to convert the if ... else ... end
statement in our outer loop into a logical expression that uses in
, ~
, &
and |
along with some constant numbers; given our original board
Image and the neighbor-count Image, we can perform an equivalent calculation by using image.merge[]
and image.map[]
. Bringing it all together,
on step board do
around:image[board.size]
each delta in -1 + 3 cross 3
around.merge["+" board.copy[].translate[delta]]
end
stay:around.copy[].map[((4,3) dict 1,1) 0].merge["&" board]
next:around .map[( 3 dict 1) 0].merge["|" stay ]
end
Give yourself a few moments to enjoy this solution. It describes the rules of Life in parallel, applying operations in bulk across the entire game board. The only explicit loop in this program iterates exactly nine times, no matter the size of the board. If we wanted, we could even “unroll” this loop and skip the (0,0)
translation:
on step board do
around:board.copy[]
around.merge["+" board.copy[].translate[-1,-1]]
around.merge["+" board.copy[].translate[-1, 0]]
around.merge["+" board.copy[].translate[-1, 1]]
around.merge["+" board.copy[].translate[ 0,-1]]
around.merge["+" board.copy[].translate[ 0, 1]]
around.merge["+" board.copy[].translate[ 1,-1]]
around.merge["+" board.copy[].translate[ 1, 0]]
around.merge["+" board.copy[].translate[ 1, 1]]
stay:around.copy[].map[((4,3) dict 1,1) 0].merge["&" board]
next:around .map[( 3 dict 1) 0].merge["|" stay ]
end
While this is bulkier, avoiding the overhead of the loop and cross product is slightly more efficient, and it is clearer to a reader that the entire function is a straight “pipeline” of Image-manipulation operations.
With some additional pondering (or at least a perusal of the relevant literature), we might arrive at a slightly more elegant approach first attributed to Donald Knuth in the context of his METAFONT language. The key idea is to reorder the translations so that several neighbors can be summed together at the same time, and to double-count the cells surrounding the central cell, which simplifies the update rule:
on step board do
horiz:board.copy[]
.merge["+" board.copy[].translate[-1,0]]
.merge["+" board.copy[].translate[ 1,0]]
vert:horiz.copy[]
.merge["+" horiz.copy[].translate[0,-1]]
.merge["+" horiz.copy[].translate[0, 1]]
next:vert.copy[]
.merge["+" vert.merge["-" board]]
.map[((5,6,7) dict 1,1,1) 0]
end
All three of these formulations are tremendously faster than our earlier attempts, and scale better to larger boards. Instead of leaving the Lil interpreter to do most of the heavy lifting, iterating over and making decisions about individual pixels one at a time, we have pushed all of the work into image.translate[]
, image.map[]
and image.merge[]
, which contain tight loops written in C (or JS, in Web-Decker).
Let’s compare the runtime for several approaches to Life:
For each implementation, we will initialize an appropriately-sized board with random cells. Lilt’s random[]
number generator is seeded consistently by default, so for a given board size each run will have the same configuration and final result. We will then step
the board for 100 generations and print the final board. Each overall execution time will be measured on the same laptop with the unix time
utility, taking the best total
time (in seconds) of three attempts:
Approach | 20x20 board | 40x40 board | 60x60 board |
---|---|---|---|
Dictionary, Naive | 1.683 | 16.900 | 80.980 |
Dictionary, Simplified | 0.489 | 3.925 | 38.885 |
Image, Iterative | 0.329 | 1.262 | 2.838 |
Image, Vector | 0.013 | 0.025 | 0.070 |
The usual lessons for improving program performance apply: Avoid unnecessary or redundant work, choose appropriate data structures, and parallelize.
Throughout our examples, we have focused on a version of Life which implicitly fixes the edges of the board as dead. What modifications to each implementation would be necessary to instead simulate Life on a toroid, in which cells are considered adjacent across opposing edges? How does this impact performance and the overall elegance of the code?
Brian’s Brain is another interesting 2D cellular automaton. It uses 3 cell states (“Live”, “Dying”, and “Dead”) and slightly different rules for how cells change state. Try modifying our Life programs to simulate Brian’s Brain instead.
Rule 110 is a 1D cellular automaton which (with certain constraints and initial conditions) is capable of universal computation. Try your hand at writing an implementation! Using an Image to store the board state works, but you could also explore using Lil’s Array interface.