Transducers Speed Up JavaScript Arrays
Processing Large JavaScript Arrays is Really Slow
In this article, I will be going over ways of improving the processing speed of large JavaScript arrays using transducers. This article builds on the findings in my previous article: Speeding up JavaScript Array Processing.
tl;dr JavaScript array processing is slow, but with transducers, you can solve a lot of those issues. Still, you’ll want to consider your native options first.
I never thought I’d need it, but I actually needed to use my solution from Speeding up JavaScript Array Processing to speed up the loading of a Chrome plugin I was writing recently.
This plugin loaded over 26,800 items in a list and did some processing on them. Surprisingly, even with a few filter
and map
functions, it was pretty fast. That is, until I started doing some advanced processing and needed to duplicate the size of the array to over 53,700 items. That’s when I noticed a many orders of magnitude slowdown.
Disclaimer: All code examples are made-up, but I made them similar to the actual implementation. The processing time on these examples are much faster than my actual implementation, but on slower machines, you’ll see even worse slowdowns.
Everything Was Working Fine
Before making any changes, let’s look at the original code:
This could possibly be improved by using transducers, but I didn’t think it was necessary since the processing time averaged 8.5ms.
Even still, let’s look at the transducer version:
Based on my previous article, you’d think this would be faster, but it’s not. Clocking in at 22.1ms, there just aren’t enough items in the list to make transducers a faster alternative than the native Array
methods nor are there enough transformations on the data. This is pretty simple so unless we either added more items or more operators, it’s not going to result in a speed improvement.
The Slowdown
This plugin was near-instantaneous to load, but all of a sudden, this one tiny little change had it spending 35 seconds doing absolutely nothing but processing a JavaScript array. How? What did I do?
Based on a bug in my handling of the requirements and how Chrome was handling this array of strings, I needed to add twice as many items to the list.
Here’s what that change looked like:
Disclaimer: While it doesn’t make sense out-of-context in this fake code example, the requirements of the project needed the second concat
version as well.
I run each benchmark 6 times. At an average of 12300ms, this was the most-boring benchmark by far. Talk about slow!
But let’s look at some ways to fix this. I know the concat
function creates a new array each time it’s run. This is definitely going to be part of the problem so I utilized the fact that concat
takes 2 args:
This subtle change of reducing the number of concat
statements took us down by half: 6600ms. You could’ve probably guessed this number though since reducing the number of arrays created by half reduces the total processing time by half.
And we can actually bring this down further! Instead of having concat
check each argument we’ve passed, we can simply give it an array instead. If you didn’t know, concat
actually flattens any arrays it’s given:
And with this simple change, we’re now down to a tolerable 2600ms. Why? I can only guess it’s because we’re creating an array ahead of time and concat
doesn’t have to process arguments
itself. This is an older method of using the argument spread which only works in older function
definitions and not in fat-arrow functions.
If you’re curious, you can find out more at this MDN article:
The Speedup
As you can see, this code, while it looks modern and functional, still suffers pretty badly because of how the reducer’s being used. RxJS transducers to the rescue!
Why Use RxJS?
As in the previous article, we’ll be using RxJS again. As you probably know, I’m familiar with RxJS and already wrote methods around subscribe
so I can use it synchronously just like JavaScripts’s Array
methods.
While I could’ve used an off-the-shelf transducer library for this project, it’s a Chrome plugin that heavily uses RxJS. Bringing in another library adds more complexity to the project and since I’ve been contracted to do this work, I want to reduce the amount of tooling another person will have to learn to maintain the project. Bringing in RxJS was already iffy, but because of the async nature of this particular plugin, it’s already reduced a lot of complexity.
We’ve Gone to Plaid!
Back to our transducer version, let’s try this again:
2300ms? Really? That’s not the speed boost I was expecting considering our numbers in the previous article.
That’s where this test fails. Transducers are a completely different way of thinking about array processing. Under-the-hood, transducers are a bunch of wrapping functions in an reduce
loop. This means, having 2 reduce
functions probably slows things down quite a bit; in fact, big O(n²).
Utilizing mergeMap
, we’ve finally gotten to a more-reasonable 70.6ms. We’ve completely gotten rid of the array-like reduce
and switched to the faster mergeMap
transducer. Now, we process all of our values and only put them into an array at the very end. That’s the key to this speed up. We both reduced the number of times we created an array (to one) and delayed it as long as possible (till the end).
This is what you were expecting right? Well hopefully not! I think we can still get it down further. We have to be able to get it that much closer to our original 8.5ms; otherwise, I consider this a failure.
Now, since the initial transducer test was 22.1ms, then we can only reasonably assume that’s the best we can do. Since we’re duplicating the array size, then 40ms is our target. We’ve still got a ways to go.
Duplicating our array logic along with a couple mergeMap
operators, this is what I’ve come up with:
I thought, since we were only creating a single array, this would be pretty fast, but it’s not. At an average of 51.5ms, it’s clearly not meeting our numbers.
Also, notice how I duplicated the filter
logic? That’s temporary. In a real situation, those operators could be combined to create a new operator that would be usable in both. That way you don’t have maintenance issues in the future. Just so you know, I did try processing the filter logic before processing the two lists, but that ended up being slower because I had to create an intermediary array.
Now I bring you to the solution — the real solution — that the I used in that plugin. It’s complex and ugly, but in the plugin, I was actually able to get back to the original processing speed which is really actually faster than just doing the one map
.
This is pretty similar to the previous version except that we’re using forkJoin
instead of mergeMap
. The difference is that it returns an array of both results. We’re actually creating 2 smaller arrays which we then combine into a larger one, but in the previous example, we were creating one big array from a bunch of processed items.
At this point, I can’t tell you why this version is faster since it’s beyond my knowledge of RxJS. I would’ve assumed the previous version would be faster, but apparently not.
Still, we were able to get 41.2ms which matches up with our estimates. But it doesn’t seem good enough to me. I think we could get this down to at least 17ms.
A Sudden Realization
This dual-processing method got me thinking, why not do the same with map
? In fact, it might be both faster and easier to read!
This is what that looks like:
And yes, it’s in-range of our estimates clocking in at 19.8ms. This is the solution I was looking for all along, but I decided to go a different route because I didn’t realize how simple it could’ve been. Seriously though, if you saw something taking 12+ seconds long, what would be your first reaction?
Conclusion
As you’ve seen, transducers can introduce some major speed benefits even in our optimized reduce
example, but until you have a large number of items or a lot of pipeline operators, they’re going to be limited in what they can achieve over native Array
methods in JavaScript.
That’s not to say all transducer libraries are like this though. I’m using RxJS which is designed to process values over time. It’s not optimized for array processing like this and merely ties in to my previous articles which also use RxJS in interesting ways. Had I been using a real transducer library, I wonder how much faster this code would’ve been.
I’m honestly surprised with the results of this article. For whatever reason, that Chrome plugin was processing a lot slower so the transducer version actually had a major impact on processing speed, but for whatever reason, Node.js v10 has some optimizations in place which sped things up a lot for the native Array
methods. That or, they slowed down the transducer methods 😡. I’ll never know! Good thing is, the transducer version only ended up being about 10–20ms slower. I bet it wouldn’t take that many more items to change these results.
The Breaking Point
Ok, I was curious and ran some more benchmarks. When comparing the fastest array and transducer methods with a list of 500,000 items. The transducer method was faster by 100ms. Finally! Something that matched my expectations.
With this particular dataset, I was able to find the tipping point: about 250,000 items. Past that, the gap widens quite substantially. Now I’m curious how processor speed and available memory affects these benchmarks. Is it going to be in favor of transducers or native array methods?
More Reads
If you’ve got an interest in more topics related to RxJS, you should checkout my other articles: