Lil is the scripting language of Decker. Most of the time, scripts in Decker are very simple, like a script which changes cards when a user clicks on a button:
on click do go[thatCoolCard] end
For simple scripts, performance may not matter at all. Sometimes, though, performance makes the difference between accomplishing your vision and sighing in frustration while the cooling fans on your unresponsive computer attempt unsuccessfully to make it levitate. In this article we’ll work through several approaches to implementing a computationally-intensive program with Lil, and along the way learn a little bit about Vector Programming and how it can help us write scripts that are more concise and more efficient.
Feel free to follow along and try your own ideas in the Lil playground; it includes a version of the final iteration described here as the
life example program.
A cellular automaton simulates a world made up of cells. We begin with some arrangement of cells in space, and use their relationships to calculate a next generation of cells. By repeatedly applying simple rules, cellular automata can produce fascinatingly complex patterns which resemble (in some ways) patterns found in nature.
John Conway’s Game of Life is one cellular automaton which has been studied extensively, exhibiting patterns which can self-replicate, move through space, and even perform computation.
The Game of Life takes place on a two-dimensional (2D) grid composed of cells which are either “dead” (above: white) or “alive” (above: black). On each generation, each cell in the grid considers each of its 8 orthogonal and diagonal neighbors, counts the number of neighbors which are alive, and computes its state in the next generation according to the following three rules:
Let’s begin by translating the rules above into a Lil program with a straightforward, imperative approach. We will focus on writing a function named
step which is given a game board representing all the cells in our miniature universe and which is responsible for computing and returning the next generation’s board.
One way to represent a Life board in Lil is to use a dictionary, associating the
(x,y) coordinates of live cells with a value of
1. If we try to look up a coordinate pair that isn’t set, we’ll get a
0, which can represent dead cells. We’ll additionally define a global variable named
SIZE which contains the
(width,height) dimensions of the entire board.
SIZE, and for each cell:
nextgeneration’s cell as alive or dead.
Translated into Lil,
SIZE:20,20 board:() ... on step board do next:() each y in range SIZE each x in range SIZE pos:x,y around:0 each dx in -1,0,1 each dy in -1,0,1 if !(0,0)~(dx,dy) around:around+board[pos+(dx,dy)] end end end if board[pos] if around in 2,3 next[pos]:1 end else if around ~ 3 next[pos]:1 end end end end next end
There are a few surface-level simplifications we could make to the above. For starters, we can tweak how we represent the rules to include the cell itself in our
around count, saving a conditional:
on step board do next:() each y in range SIZE each x in range SIZE pos:x,y around:0 each dx in -1,0,1 each dy in -1,0,1 around:around+board[pos+(dx,dy)] end end if board[pos] if around in 3,4 next[pos]:1 end else if around ~ 3 next[pos]:1 end end end end next end
Next, we’ll focus on our inner loop:
around:0 each dx in -1,0,1 each dy in -1,0,1 around:around+board[pos+(dx,dy)] end end
Instead of reassigning
around in each iteration of our inner loops, we can take advantage of the fact that
each loops naturally return a list of the results of their body expression and
sum those together:
around:sum each dx in -1,0,1 sum each dy in -1,0,1 board[pos+(dx,dy)] end end
Lil has a
cross operator which can be used to calculate the cartesian product of two lists or ranges:
3 cross 3 # ((0,0),(1,0),(2,0),(0,1),(1,1),(2,1),(0,2),(1,2),(2,2)) -1 + 3 cross 3 # ((-1,-1),(0,-1),(1,-1),(-1,0),(0,0),(1,0),(-1,1),(0,1),(1,1))
This is exactly what we’re expressing with our nested
dy loops, so with
cross we can collapse them together:
around:sum each delta in -1 + 3 cross 3 board[pos + delta] end
A pair (pos) doesn’t conform with a list of pairs (the deltas), but it does if we
flip the list of deltas first. In this way we can add a position to an entire list of offsets at once:
deltas: -1 + 3 cross 3 # ((-1,-1),(0,-1),(1,-1),(-1,0),(0,0),(1,0),(-1,1),(0,1),(1,1)) (10,20) + deltas # ((9,9),(20,19)) (10,20) + flip deltas # ((9,10,11,9,10,11,9,10,11),(19,19,19,20,20,20,21,21,21)) flip (10,20) + flip deltas # ((9,19),(10,19),(11,19),(9,20),(10,20),(11,20),(9,21),(10,21),(11,21))
Taking advantage of this, we can “lift” the addition from the inner loop:
around:sum each x in flip pos + flip -1 + 3 cross 3 board[x] end
Which in turn allows us to write the expression more concisely with the
@ operator instead of an explicit loop:
around:sum board @ flip pos + flip -1 + 3 cross 3
cross operator can be used to simplify our outer loops as well:
on step board do next:() each pos in SIZE cross SIZE around:sum board @ flip pos + flip -1 + 3 cross 3 if board[pos] if around in 3,4 next[pos]:1 end else if around ~ 3 next[pos]:1 end end end next end
This is a reasonably concise and straightforward solution, but it is not particularly fast. Lil dictionaries, like all of Lil’s primitive datatypes, are immutable. Every time we write a
1 to some position in
next, the Lil interpreter constructs an entirely new dictionary with the same keys and elements plus our added key-element pair, and stores the result in
next. On a large board with many living cells, this overhead can add up to a huge amount of work! Lil’s copy-on-write semantics mean dictionary insertions are (in general) linear with respect to the size of the dictionary, rather than constant.
In addition to primitive types, like lists and dictionaries, Lil is furnished with a number of interfaces, which expose system resources and provide mutable data structures. The Image interface is a perfect fit for our game board: it represents a 2D array of bytes.
We don’t have to change much to use an Image instead of a dictionary; Images can also be indexed with coordinate pairs. The
SIZE constant is no longer needed, since the Image interface itself keeps track of its dimensions:
on step board do next:image[board.size] each pos in board.size cross board.size around:sum board @ flip pos + flip -1 + 3 cross 3 if board[pos] if around in 3,4 next[pos]:1 end else if around ~ 3 next[pos]:1 end end end next end
We avoided explicitly storing
0 values for dead cells in the dictionary to keep it “sparse” and minimize the number of entries required. An Image is always “dense”, so this is no longer a concern. As a result, we can simplify our conditionals a bit, taking advantage of the fact that
if ... else ... end returns the value of the taken branch:
on step board do next:image[board.size] each pos in board.size cross board.size around:sum board @ flip pos + flip -1 + 3 cross 3 next[pos]:if board[pos] around in 3,4 else around~3 end end next end
In fact, we could do without conditionals entirely by reworking the above into a single logical expression:
on step board do next:image[board.size] each pos in board.size cross board.size around:sum board @ flip pos + flip -1 + 3 cross 3 next[pos]:(board[pos] & around in 3,4) | around~3 end next end
This Image-based solution is significantly shorter, and a good bit faster. Instead of rebuilding the board dictionary for every write, the Lil interpreter only needs to allocate a fresh Image once at the beginning of the function, and writes modify that Image in-place. Since we re-write every pixel of the Image, we could even reuse a pair of Images and swap them on every generation. In Decker, the Image-based board representation has the additional benefit that we can easily display an Image graphically in the Listener or by pasting it to a Canvas Widget, or even the Card background Image.
Lil has many features inherited from Vector-oriented programming languages (sometimes also called “Array Languages”). The essence of vector programming is learning to think about manipulating entire data structures at once, in parallel, rather than serially.
For example, if we wanted to multiply a list by a scalar number in Lil we could write a loop which processes list elements serially:
each v in mylist 3*v end
But the vector-oriented approach is to take advantage of the parallelism built into the
The Image interface includes a number of functions for doing bulk operations on its pixels which we can take advantage of:
image.map function takes a dictionary as input, and updates every pixel in the Image by looking it up through the dictionary. For convenience,
image.map also accepts an optional second argument giving a “default” value for pixels not found in the keys of the dictionary. The main purpose of
image.map is swapping colors in graphics, but it can also be used to express any pure unary function of pixels. For example, given an Image
x containing pixels with patterns
x.map[(0,1) dict (1,0)] applies a logical
NOT to the Image, and
x.map[(3,5) dict (1,1) 0] is loosely equivalent to
x in 3,5.
image.translate function shifts an Image’s pixels by a specified offset. For example,
x.translate[-1,2] shifts every pixel of an Image
x one pixel left and two pixels down. By default, pixels “shifted in” are always pattern
0 and pixels “shifted out” are destroyed, but
image.translate also accepts an optional second argument which can indicate shifted pixels should “wrap” around opposite edges.
image.merge function is for compositing Images together. One way of merging Images is to specify a binary operator (
"=") and a source Image. The pixels of both the source and target Image are paired up, the operator is applied to each pair, and the result is written back to the target Image.
Since all of these operations modify the target Image in-place, we may need to use
image.copy before applying them, to avoid destroying a dataset we need again later.
The inner loop of our current solution probes the board for the neighbors of each individual pixel by adding an offset to the coordinates of the central pixel. If we instead imagine using
image.translate, we could apply each of those offsets to the entire Image at once, and then use
image.merge["+" ...] to sum all of the live cells in the translated Images together.
We were already able to convert the
if ... else ... end statement in our outer loop into a logical expression that uses
| along with some constant numbers; given our original
board Image and the neighbor-count Image, we can perform an equivalent calculation by using
image.map. Bringing it all together,
on step board do around:image[board.size] each delta in -1 + 3 cross 3 around.merge["+" board.copy.translate[delta]] end stay:around.copy.map[((4,3) dict 1,1) 0].merge["&" board] next:around .map[( 3 dict 1) 0].merge["|" stay ] end
Give yourself a few moments to enjoy this solution. It describes the rules of Life in parallel, applying operations in bulk across the entire game board. The only explicit loop in this program iterates exactly nine times, no matter the size of the board. If we wanted, we could even “unroll” this loop and skip the
on step board do around:board.copy around.merge["+" board.copy.translate[-1,-1]] around.merge["+" board.copy.translate[-1, 0]] around.merge["+" board.copy.translate[-1, 1]] around.merge["+" board.copy.translate[ 0,-1]] around.merge["+" board.copy.translate[ 0, 1]] around.merge["+" board.copy.translate[ 1,-1]] around.merge["+" board.copy.translate[ 1, 0]] around.merge["+" board.copy.translate[ 1, 1]] stay:around.copy.map[((4,3) dict 1,1) 0].merge["&" board] next:around .map[( 3 dict 1) 0].merge["|" stay ] end
While this is bulkier, avoiding the overhead of the loop and cross product is slightly more efficient, and it is clearer to a reader that the entire function is a straight “pipeline” of Image-manipulation operations.
With some additional pondering (or at least a perusal of the relevant literature), we might arrive at a slightly more elegant approach first attributed to Donald Knuth in the context of his METAFONT language. The key idea is to reorder the translations so that several neighbors can be summed together at the same time, and to double-count the cells surrounding the central cell, which simplifies the update rule:
on step board do horiz:board.copy .merge["+" board.copy.translate[-1,0]] .merge["+" board.copy.translate[ 1,0]] vert:horiz.copy .merge["+" horiz.copy.translate[0,-1]] .merge["+" horiz.copy.translate[0, 1]] next:vert.copy .merge["+" vert.merge["-" board]] .map[((5,6,7) dict 1,1,1) 0] end
All three of these formulations are tremendously faster than our earlier attempts, and scale better to larger boards. Instead of leaving the Lil interpreter to do most of the heavy lifting, iterating over and making decisions about individual pixels one at a time, we have pushed all of the work into
image.merge, which contain tight loops written in C (or JS, in Web-Decker).
Let’s compare the runtime for several approaches to Life:
For each implementation, we will initialize an appropriately-sized board with random cells. Lilt’s
random number generator is seeded consistently by default, so for a given board size each run will have the same configuration and final result. We will then
step the board for 100 generations and print the final board. Each overall execution time will be measured on the same laptop with the unix
time utility, taking the best
total time (in seconds) of three attempts:
|Approach||20x20 board||40x40 board||60x60 board|
The usual lessons for improving program performance apply: Avoid unnecessary or redundant work, choose appropriate data structures, and parallelize.
Throughout our examples, we have focused on a version of Life which implicitly fixes the edges of the board as dead. What modifications to each implementation would be necessary to instead simulate Life on a toroid, in which cells are considered adjacent across opposing edges? How does this impact performance and the overall elegance of the code?
Brian’s Brain is another interesting 2D cellular automaton. It uses 3 cell states (“Live”, “Dying”, and “Dead”) and slightly different rules for how cells change state. Try modifying our Life programs to simulate Brian’s Brain instead.
Rule 110 is a 1D cellular automaton which (with certain constraints and initial conditions) is capable of universal computation. Try your hand at writing an implementation! Using an Image to store the board state works, but you could also explore using Lil’s Array interface.