Lazy Pipelines with Generators in TypeScript

Published in

ITNEXT

5 min readJan 25, 2020

In recent years the JavaScript community has embraced the functional array methods like map and filter. Writing for-loops has become something that gets associated with 2015 and JQuery. But the array methods in JavaScript are far from ideal when we are talking about performance. Lets look at an example to clarify the issues:

This code will execute the following steps:

create the array with 5 items
create a new array with all the numbers doubled
create a new array with the numbers filtered
take the first item

This involves a lot more stuff happening then is actually needed. The only thing has to happen is that the first item that passes x > 5 gets processed and returned. In other languages (like Python) iterators are used to solve this issue. Those iterators are a lazy collection and only processes data when it is requested. If JavaScript would use lazy iterators for its array methods the following would happen instead:

[0] requests the first item from filter
filter requests items from map until it has found one item that passes the predicate and yields (‘returns’) it
map has processed an item for each time filter requested it

Here we did only map and filter the first tree items in the array because no more items where requested from the iterator. There where also no additional arrays or iterators constructed because every item goes through the entire pipeline one after the other. This is a concept that can result in massive performance gains when processing a lot of data.

Generators and iterators in JavaScript

Luckily for us JavaScript does actually support the concept of iterators. They can be created with generator functions that yield the items of the collection. A generator function looks as follows:

Here the for-loop will request an item of the iterator for each loop. The generator function uses the yield keyword to return the next item in the collection. As you see we can yield multiple times to create iterators that contain multiple items. There will never be any array constructed in memory. We can make this a bit better to understand when we remove some of the syntax sugar:

Here you can see that an iterator has a next method for requesting the next item. The result of this method has the value and a boolean indicating of we have more results left in the iterator. While this all very interesting, we will need some more things if we want to construct proper data pipelines with iterators:

conversions from arrays to iterators and back
iterators that operate on other iterators, like map and filter (also called ‘higher-order iterators’)
a proper interface to chain all of those together in an elegant and practical way

In the rest of this article I will show how to do those things. At the end I have included a link to a library I created that contains a lot more features. Sadly, this is not a native implementations of lazy iterators. This means that there is overhead and in a lot of cases this library is not worth it. But I still want to show you the concept in action and discuss its pros and cons.

Iterator constructors

We want to be able to create iterators from multiple data sources. The most oblivious one is arrays. This one is quite easy, we loop over the array and yield all items:

Turning an iterator in an array will require us to call next until we have gotten all the items. After this we can return our array. Of course you only want to turn an iterator into an array when absolutely needed because this function causes a full iteration.

Another method for reading data from an iterator is first. Its implementation is shown bellow. Note that it only request the first item from the iterator! This means that all the following potential values will never be calculated, resulting in less waste of resources in the data pipeline.

In the complete library there are also constructors that create iterators from functions or ranges.

Higher-order iterators

A higher-order iterator transforms an existing iterator into a new iterator. Those iterators are what makes up the operations in a pipeline. The well-known transform function map is shown bellow. It takes an iterator and a function and returns a new iterator where the function is applied to all items in the original iterator. Note that we still yield item-for-item and preserve the lazy nature of the iterators while transforming them. This is very important if we want to actually achieve the higher efficiency I talked about in the intro of this article!

Filter can be implemented in a similar way. When requested for the next item, it will keep requesting items from its inner iterator until it has found one that passed the predicate. This item will be yielded and execution is halted until the request for the next item comes in.

Many more higher-order iterators can be constructed in with the same concepts I have show above. The complete library ships with a lot of them, check them out over here.

The builder interface

The last part of the library is the public facing API. The library uses the builder pattern to allow you to chain methods like on arrays. This is done by creating a function that takes an iterator and returns an object with the methods on it. Those methods can call the constructor again with an updated iterator for the chaining:

The example of the start of this article can be written as bellow. In this implementation we don’t create additional arrays and only process the data that is actually used!

Conclusion

In this article I have shown you how generators and iterators can be used to create a powerful and very efficient library for processing lots of data. Of course iterators are not the golden bullet that will fix everything. The gains in efficiency are down to saving on unnecessary calculations. How much this means in real numbers is completely down to how much calculations there can be optimized out, how heavy those calculations are and how much data you are processing. When there are no calculations to save or the collections are relative small, you will potentially lose performance to the overhead of the library.

The full source code can be found on Github and contains more features that fitted in this article. I would love to hear your opinion on this. Do you think it is a pity that JavaScript doesn’t use lazy iteration for the array methods? And do you think that using generators is the way forward for collections in JavaScript? If JavaScript would use lazy iterators by default they should be able to optimize the overhead away (like other languages have done) while still preserving the potential wins with efficiency.