Lila: a Lil Interpreter in POSIX AWK

AWK is among the most ubiquitous programming languages in the world. Much like ed, the standard text editor, awk is a mandatory component of any POSIX operating system. On a fresh-out-of-the-box Mac, tunneling into the embedded Linux environment on your router, or scrabbling away at GitBash on a Windows machine, you may not have access to Python, Perl, or even Tcl, but you can rest assured that some flavor of awk is already installed. This is a powerful proposition.

In our modern age, AWK is frequently regarded as a domain-specific language akin to sed, constrained in its usefulness to batch processing of semi-structured text files and streams. While many features of the language are indeed tailored to this ecological niche, AWK is a fully general-purpose language; AWK provides concise and familiar control structures, floating-point arithmetic, a small but flexible toolkit of string manipulation operations, and powerful associative arrays. I could simply tell you that AWK can do nearly anything, but perhaps it would be more persuasive to show you?

The Name’s Lila

Lila is an implementation of Lil in AWK. Lil is a multi-paradigm language with a rich set of primitive operators and a “batteries-included” standard library featuring such goodies as JSON, XML, CSV and structured binary parsing, first-class database-style tables with a SQL-like query syntax, and functional niceties like a REPL and tail-call elimination.

% ./lila.awk
 select k:first value v:count value by value from "AABABBBBBBABACBA"
+-----+---+
| k   | v |
+-----+---+
| "A" | 6 |
| "B" | 9 |
| "C" | 1 |
+-----+---+

A complete Lil system demonstrates by construction solutions to a broad range of interesting and practical programming tasks. It also provides some intriguing possibilities for making the Lil ecosystem more self-hosting and portable: Lila can subsist anywhere there’s an AWK, even places C compilers and JavaScript interpreters dare not venture!

In broad strokes, Lila’s interpreter closely follows the structure of the C-based Lil interpreter: It tokenizes and parses source code strings, assembling them in a single pass into a stream of bytecode instructions which are executed by a stack-based virtual machine. Primitive operators and the guts of Lil’s “Interface” values are AWK functions written in terms of boxed “Lil-values” which represent Lil’s core data structures: numbers, strings, lists, dictionaries, tables, and functions.

Since AWK associative arrays cannot be recursively nested, an AWK associative array cannot directly represent, say, a Lil dictionary. Instead, Lila uses a collection of global associative arrays to represent a heap of Lil-value “structs” keyed by numeric indices. From the perspective of most of the AWK code that uses Lil-values, these indices are simply pointers! All the members of Lil-values are represented as AWK numbers or strings.

Performance Anxiety

Computers, one often hears, are pretty fast. Lila’s interpreter, however, is doing quite a bit of indirection and string acrobatics with each expression. Is it still usably performant, or are we looking at speeds rivaled by dead snails in molasses?

An experiment was performed on my 2020 M1 Macbook Air comparing the C and JS-based reference implementations of Lilt with Lila running under a variety of AWK implementations: mawk, gawk, goawk, and the BSD awk which ships with MacOS (v20200816 in this case). As my benchmarks, I used Mandel.lil, a naive Mandelbrot set renderer designed to hammer the Lil interpreter and garbage collector, and the Lil integration test suite:

Interpreter Mandel Tests
c-lilt (release) 0.08s 0.27s
c-lilt (debug) 0.36s 0.79s
js-lilt 0.42s 1.58s
mawk 3.61s 0.95s
gawk 6.58s 2.47s
goawk 10.66s 4.32s
awk 15.09s 162.97s

Of the AWKs tested, mawk makes a truly impressive showing. The “one-true” AWK fares much poorer, particularly with the integration suite. I ran a similar experiment on my 9-year old intel-based Macbook Air which provides awk version 20070501, and- curiously- even on that far humbler machine I was able to drastically outstrip the M1 with a more “modern” AWK release, completing the integration suite successfully in about 16 seconds.

Your mileage may vary, but overall I found that even on my slowest computers and their most ancient AWKs Lila offers a viably interactive REPL experience.

There aren’t many monolithic AWK scripts on the internet of a similar scale and complexity to lila.awk. If you’re the maintainer of an AWK- or you’re thinking of becoming one- Lila might be a useful tool for investigating performance and correctness.

Conclusion

Hopefully this article has mildly piqued your interest in AWK, Lil, or their combination. The next time you find yourself drawing in a Python dependency for simple command-line automation or struggling to express complex logic in a shell script, consider reaching for AWK.

Sometimes the best tool is the tool you already have.

back