“How do I do that in TypeScript?”: Generator expressions

I wrote in “Some things I love about advent of code” about how I was using Advent of Code to help learn TypeScript. I did some of the old puzzles in Python – a language I’m comfortable with – then redid them in this language I’m brand new to. This gave me a chance to look at “what would I write if I knew the language” and ask “so how do I do that in TypeScript?”

A range of ovals made of punctuation symbols in different colours - they look a bit like islands with different terrain. One is grassy, one has volcanoes. Various bright yellow stars and dotted around.
The Advent of Code ASCII-art front page for 2023 – gets filled in as you solve the puzzles

I’m a fan of using generator expressions in Python, both for coding puzzles and in real-life code, so I had a go at making a TypeScript class to let me do something similar there. It’s in npm (generator-sequences) if you’d like a look now, or we can read through my thinking first.

What was I trying to do?

Generator expressions in Python don’t mind where their data comes from, which is a nice feature for testability. Advent of Code puzzles usually have a long input file, but I can test-driven design functions and pipelines by passing them an array of strings or whatever is needed, and then switch to feeding them lines from the input file for doing the real work. In Python, you can use a generator to get individual lines from a file:

def lines_from_file(path):
    with open(path, 'r') as f:
        for ln in f:
            yield ln.rstrip()

This avoids loading the whole file into memory at once – it scoops up a small chunk (around 4k bytes), and hands out one line at a time as they’re asked for, quietly getting another small chunk from disk each time one’s needed. That means you can use this in a for loop, just as if it was an array.

Then, you might need to do various operations on each line – turn it into an object, expand it into a more complicated set of values, make a note of the results – it’s good to avoid keeping all these in memory longer than you have to. Generator expressions let you queue up work and only pull values through it as needed.

# For testing, use an array
input = ["1", "20", "3", "44"]

# Or in real code, use a possibly huge file
input = lines_from_file("./data.txt")

# Run some conversions and calculation
nums = (int(ln) for ln in input)
highs = (n for n in nums if n > 10)
total = sum(highs)

What might not be obvious here: Lots of work is only done here when it’s needed. The call to lines_from_file gives you an object that’s waiting for you to ask for lines, the declaration of nums will wait for a request and then pull one value from input, convert and hand it out – so we’re never making an in-memory collection of all the contents of input. And in this code, it’s only when sum is called that any work starts: this repeatedly asks for values until input has no more to give. At any moment, there’s only one item moving through this pipeline – immediately released to the garbage collector once it’s done with.

Why would you do that in TypeScript?

Honestly, you probably shouldn’t – a classic mistake when learning a new language is to try and make it work like the language you’re more used to, instead of learning its idioms. There’s probably a more TypeScript-flavoured approach that would make sense.

However, as a learning exercise it’s been brilliant, giving me a reason to learn more about lots of language features.

When I first learned about generators in Python, I could understand how they worked but struggled to think of a compelling reason to ever use them. That changed when I read through the slides for a “Generator tricks for systems programmers” talk by David Beazley. There’s a lot of slides, but each one’s got only a little content I’d encourage you to take a look. It was slide 68 where my “no way!” moment occurred, as I saw how the straightforward, step-by-step code we’d been building up had now given us a powerful, intuitive query language that can run over gigabytes of log files. This was built up from several small, reusable functions in a pipeline that can be plugged together and swapped out in various ways.

Slide 68 from the linked PDF deck. Title is "a query language", and has a few lines of code showing how to "find the set of all documents that 404" and "print all requests that transfer over a megabyte".

If you’re interested: reading parts 1-4 will get you that thorough “so that’s what these can do” experience I had above – and parts 5 and 6 get you thinking bigger, about how some simple changes can adapt the pipeline to work with infinite sequences (e.g. a stream of logs as they arrive), across networks, and more – all without changing most of the parts you built earlier. There’s even more in there, but these are the parts that stuck with me.

Maybe it’s just me, but I found all this fascinating. There’s various situations where you can avoid pulling a whole dataset into memory and start doing operations on it (that might add even more content that takes up memory). Instead, stream just the data you need through various functions, throwing away intermediate outputs as you’re done with them. Advent of Code is an ideal place to use this kind of idea – every year there’s various “and if you do it the obvious way, now you’re out of memory” problems so I get used to bringing in only what I need whenever possible.

Python makes using generators very simple. Is there a way to get something similar in TypeScript?

Can TypeScript do this?

Python’s syntax for these is short and readable (once you’ve learned it). I don’t think I can do that in TypeScript, but I was happy to use a class with methods if I can make that similarly readable. Python expressions are equivalent to some common functional programming operations:

  • [something] for x in input is map
  • for x in input if [condition] is filter
  • sum(input) and various other functions are types of reduce
  • These can all be combined into one line, or separated out for readability.

Python doesn’t tend to use functions with these more familiar names – here’s a discussion on why.

TypeScript does have all those functions (from JavaScript), but only for arrays – that means you need to have the whole input in an array in memory, and calling map would create a whole array of all the transformed items in memory too.

You can build the kind of sequence and functionality I want using generator functions – and I found JavaScript does have those, described in this entertaining blog post from James Sinclair. Describes how they work and why you’d want them, using infinite Tim Tams (a popular snack in Australia).

Screenshot from linked blog post. It has an embedded YouTube video titled "Cate Blanchett - Tim Tam Biscuit TV Commercial", with an image of Cate holding up a lamp that looks like the type a genie comes out of.

Above the embed, there's some text: "Infinite Iterators:
Back in the 1990s, Arnotts ran a series of commercials for Tim Tams. The first of these featured Cate Blanchett meeting a genie. And involved her character wishing for a packet of Tim Tams that never runs out."
James Sinclair’s post includes videos from Australian celebrities explaining the appeal of Tim Tams.

For more on generator functions, I had a read of some javascript.info pages: “Generators” and “Async iteration and generators”. This was very helpful to understand how I can use them with sequences that pull from large files or other potentially slow sources; operations like that need to be asynchronous.

Finally, I found that the exact kind of thing I’m after is going to get added to the language at some point – proposals are in progress for iterator helpers and async iterator helpers. Good to know it’s not something that only I care about.

All this reading gave me enough information to have a go at making the kind of class I want.

Building a class

The sources above, plus lots more digging around JavaScript and TypeScript documentation, taught me what I needed to get this working. If any of it looks confusing and you’d like more explanation, or looks like I’ve made some mistake and could improve it, let me know!

Let’s make a Sequence class (in sequence.ts), to wrap whatever input source we want to get values from:

export class Sequence<T> {
    constructor(
        protected readonly seq: Iterable<T> | AsyncIterable<T>) {}

It’s generic, so you can have a Sequence of string, of number, or any other type that’s useful. It’s created by wrapping anything you can iterate over – an array, lines from a file, etc.

Then, we’ll make it usable in loops: for await (const item of input). Whatever the input source, I assume it’s async – if it’s an array or other simple type it will be ready immediately, but it’s safe to await its values anyway. The advantage of doing that is that I can test code using simple arrays, then run the exact same code to pull from files or anywhere else.

async *[Symbol.asyncIterator]() : AsyncGenerator<T, void, void> {
    for await(const item of this.seq) {
        yield item;
    }
}

Next, add methods for those operations we want. Here’s map:

map<TMap>(mappingFunction: (x: T) => TMap) : Sequence<TMap> {
    async function* apply_map(inputSeq: Iterable<T> | AsyncIterable<T>) {
        for await (const item of inputSeq) {
            yield mappingFunction(item);
        }
    }
    return new Sequence(apply_map(this.seq));
}

You can call this on a Sequence<T>, and give it some operation that takes a T and returns a TMap , to produce a new Sequence<TMap>. For example:

const nums = new Sequence([1, 2, 3]);
const doubles = nums.map(x => x * 2);
// doubles will give "2, 4, 6" when you loop through it.

You can see more examples in sequence.test.ts, demonstrating filter, reduce, and various convenient functions like sum.

I’ve also included a function to give lines from a file – this works just like the Python lines_from_file example I showed earlier, letting you iterate through huge files while only bringing small chunks into memory at a time.

import readline from "node:readline/promises";
import fs from "fs";

export function linesFromFile(path: string) : Sequence<string> {
    async function* readFile() {
        for await (const line of readline.createInterface({input: fs.createReadStream(path)})) {
            yield line;
        }
    }
    return new Sequence(readFile());
}

I’m happy with how all these work together. For example, in Advent of Code 2016 day 4 (puzzle description), you can see my code in github. I made and tested a function to parse a line of text into a “room”, and one to test whether it’s a “real room”, and then used these sequences to stream the whole input file through these operations.

const input = linesFromFile(filepath);

const realRooms = input.map(parseRoom).filter(isRealRoom);
const sumOfSectorIds = Sequence.sum(realRooms.map(r => r.sectorId));

Packaging things up

I did this for the December 2023 Advent of Code, and saw the kind of features I want were in discussion for being added to JavaScript anyway, so I assumed I wouldn’t need my class again. Checking again now, as December 2024 approaches, I see those proposals aren’t implemented yet – iterator helpers is almost there, but async iterator helpers (the one I really want) looks further off.

So, to help me learn some more parts of JavaScript/TypeScript, and let me easily reuse the code in this year’s puzzles, I’ve published it as an npm package (first time I’ve tried that!)


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *