Curious About the Impact (If Any) When Using map() and filter() Together

Tell us what’s happening:

When mapping and filtering a large array of objects, I was curious as to whether the order in which map and filter are applied have any impact on efficiency or if it doesn’t really matter…

Challenge: Use the filter Method to Extract Data from an Array

Link to the challenge:

My Solution

this code, which filters first, and then maps a subset of items:

*** SPOILER ALERT ***

var filteredList = watchList.filter(item => {
  return parseFloat(item['imdbRating']) >= 8.0;})
.map(item => {
  return {
    title : item['Title'],
    rating : item['imdbRating']
  }});

versus this code, basically the same as Solution 1 (see the Hints page), which maps a subset of items and then filters them:

*** SPOILER ALERT ***

var filteredList = watchList.map(item => {
  return {
    title : item['Title'],
    rating : item['imdbRating']
  };
}).filter(item => {
  return parseFloat(item['rating']) >= 8.0;});

It would seem to me that since the filter is being applied to a specific key:value pair in the object, it shouldn’t matter. Is this correct?

Or perhaps internally (transparent to the coder/programmer) it has to iterate through and check every key:value pair to see if it matches on key and then grabs the value associated with the key, then does the filtering in a similar way.

If that was the case, then it seems more efficient to first grab only what you need, and then filter through it, unless it’s having to parse through everything anyways. Then it’s a moot point and either way has equal efficiency.

Any thoughts?

1 Like

The general rule is to filter before you map. A filtered list is smaller, so the map has less work to do.

3 Likes

The filter is going to go through every single element and test it against the callback.

But is it better to map or filter first? Well, if I have a list of 100 items but I only want to use 15… If I filter first, I filter 100 elements and then map 15 of them. If I do it the other way, I map 100 then filter 15. It terms of iterations it is the same.

The thing that tips the scales for me though is that both of those methods will create a new data array - it makes sense to me to keep that small so I’d want to filter first.

In most applications, it won’t make much of a difference, though.

4 Likes

@JeremyLT @kevinSmith Thanks for your replies!

Yeah that’s kind of what I was thinking and it makes more sense to me as well. Especially since map() and filter() are Array Iteration methods.

Then I got to thinking about how variables, say let a = 5; , or any kind of pointer/reference are retrieved from memory.

Does JavaScript (or any language for that matter) and the machine know exactly where the value is stored in memory and immediately retrieves it, or is that also a form of iteration? Does the machine have to go through the RAM and find where a is stored and then retrieve the value.

I guess this is probably getting deeper into the internals of how a language and the machine interact. And perhaps on a modern 64-bit GHz machine with gigabytes of fast RAM, it happens so fast that it’s instantaneous to us humans :slightly_smiling_face:

Yes, if you use assembly language or a system programming language a step up from that (C or Rust for example), you say “I want this much memory” then you say “I want to put this value in this bit of memory” and so on.

JavaScript runtimes are generally written in a system programming language (C++ normally), but JS itself is a garbage collected language. You don’t manage the memory yourself*. When objects come into scope, the language runtime decides how to handle them, where to put them in memory. When objects go out of scope, the language runtime cleans up that memory.

When you declare let a = 5, the runtime will try to store that as efficiently as possible, but how and where it’s stored in volatile memory (RAM) is going to be context-dependent (where in the code is it defined?) and runtime-dependent (how does this particular runtime deal with storing values in this particular context?).

Ideally, your let a = 5 just stores that value at a certain place in memory directly**. But there’s a load of extra stuff on top of this in JS.

It’s difficult to actually get the assembly language output of (eg) the V8 JS engine (the one that powers Chrome/Chromium/Node etc), but as an illustration, if I were to print the assembler output of something like [1,2,3,4,5].filter(v => v < 4).map(v => v * 2);, that’s gonna be at least hundreds of Kb, possibly a few Mb. So “low level” optimising in JS is generally a bit silly: the JS engine is doing far too much under-the-hood for that to be at all sensible.

But laying stuff out logically, that does have clear effects. JS engines tend to be very fast, but this doesn’t mean they don’t have to do loads of work. Reducing the amount of work they do, that’s often not too difficult. You just need to understand what the methods you’re using are doing at a basic level (not low-level, just “what do map/filter do”).

So with map & filter, both of those go through an entire array, use a closure to keep hold of values in the array, create a new one with the result and a load of other things. Even creating an array requires quite a lot of work.

So if you put map first, you get an array created that is the same size as the original one. Then that array gets fed into filter.

But if you put filter first, then it is likely that the array that it creates is smaller than the original one. The result of filter then gets fed into map, which, in turn, has to do less work. If you’ve got very big arrays, or you’re doing something that takes quite a bit of work in the filter/map functions, this will visibly make a difference.

Yes. JS engines and the computers they run on are often plenty fast enough for it to make very little difference to the human eye. There was a programmer called Joe Armstrong who had a quote re speeding up language runtimes (I forget the source so I’m paraphrasing), and he basically said “if you want your language to run faster, just wait a few years”. Computers speed up. Coupled to that, the web runs on JS, so Google et al have poured money into optimising JS engines to the nth degree. So don’t worry too much about the low-level mechanics (worry a bit though! It’s a good thing to worry about and investigate, it’s really instructive! :slightly_smiling_face: ).

* with a caveat: you can use a strict subset of JS called WebAssembly (WASM) where the memory is managed, but that’s normally written in a, say, C or C++ or Rust and compiled to WASM code.

** & re. storing values, there’ll be a lookup table: instead of iterating through memory, to access the value the computer just checks what is at the defined slot for that memory address

3 Likes

Thank you for that very informative, clear, well thought out answer! :+1:

It makes a lot more sense to me now. All the answers above are informative, satisfactory and appreciated. Too bad I could only choose 1 “solution”.

I’m a self-taught IT generalist (no formal education) of 25yrs, who has really only dabbled with programming in the past. So the hardware and and low-level mechanics related stuff (that one might learn in a solid CS curriculum) does pique my interest. I tend to want to know the “what”, “when”, “how”, “where” and “why” about things even if I won’t use it directly in my line of work. :thinking: :grinning:

So true :laughing:

A quick and dirty benchmark can be done using

console.time("filter then map")
// execute your function here
console.timeEnd("filter then map")

The string used is simply the label for the console. Try it both ways, then try it with the old school loop. The results might surprise you.

1 Like

Local benchmarking like that can be notoriously bad for JS, especially if you want precision. If I were to do something like that, I’d want loop it many times and be sure to change the order of the tests.

Or I could just remember that JS is not a language made for micro-optimization.

2 Likes

oic… yeah. Not that I am interested in micro-optimization of JS… :grinning: It’s more of a theoretical/philosophical/academic curiousity.

So as a proof of concept anyways…I did as @snowmonkey suggested, with enough loops as @kevinSmith suggested to make the difference apparent, but not too many as to crash or hang my browser.

Granted on my 3.8GHz machine w/32GB RAM the timing diffs are not huge but it is a good visualization for me anyway…

i < 100000;

// The global variable
var watchList = [
  {
    "Title": "Inception",
    "Year": "2010",…
map then filter: 153ms - timer ended
i < 100000;

// The global variable
var watchList = [
  {
    "Title": "Inception",
    "Year": "2010",…
    
filter then map: 145ms - timer ended
i < 10000000;

// The global variable
var watchList = [
  {
    "Title": "Inception",
    "Year": "2010",…
    
filter then map: 14543ms - timer ended
i < 10000000;

// The global variable
var watchList = [
  {
    "Title": "Inception",
    "Year": "2010",…
map then filter: 15742ms - timer ended

…makes sense

Out of edits for the day… should have said…

Not that I am interested in trying to micro-optimize my JS code in the “real-world” as a JS n00b with 1 user on a consumer level PC lol… but I see your point, thanks

I think there is a difference between micro-optimizations and understanding best practices. Following a general principal of ‘do less work’ is a best practice, which is why you filter before map (or reduce), and this best practices leads to better performance except in extremely strange cases.

1 Like

Yeah, and the quick & dirty, albeit imprecise, benchmark visually re-enforces to me the point that all of you have stated in your replies: filter then map.

Well now at least I know none of you were lying to me :laughing: lol (j/k)

Thanks again!

1 Like

Yeah, I mean, I think it is good to make smart choices. I would definitely filter before map. If a PR came across my desk doing the opposite, then I definitely would comment on it.

I think it is good to make smart choices, but also to keep it in perspective. If I know that it is only ever going to be 20 elements, then it really doesn’t matter. If it’s 10 billion records, then maybe I dig a little deeper. And of course in practical terms, in web dev, anything less than 1/10 of a second isn’t going to be noticed anyway.

I always say (extending on Knuth), make smart choices, but don’t obsess about optimization until you find something whose lack of optimization is affecting performance - then optimize the crap out of it.

2 Likes

For me, I generally write code that’s for arrays with millions or billions of entries, so keeping an eye on performance implications is second nature to me. I try to follow the general principals of

  1. do no more computations than you must

  2. store no more crap than you must

and then optimize any hot spots IDed in profiling.

2 Likes

My point was not so much “here’s the best way to benchmark,” because it really isn’t. It gives you a comparison, though, that can provide some insight.

The reason I mentioned the traditional for loop? By this comparison it generally is faster. It is a different coding style, it is not as “functional programming” as a filter and map (or a reduce), but at this point, is interesting point of reference.

1 Like

When you learn reduce? Do it all in one step. Completely avoid the order question. :grin:

I think one of the interesting things about Rust is that there functional methods are faster and filter + map is generally faster than a single reduce.

Edit: dumb keyboard… Rest != Rust

1 Like

Yes, that seems to be true. Gather around while old uncle Kevin tells a story [puts on cardigan, lights pipe]

Back in the olden days, when I first learned programming (I think it was during the Hoover administration), we often talked about optimization because computers were slow and didn’t have a lot of memory. You had to be conscious of the trade-offs between memory and speed.

But now we live in a world where memory and speed are cheap and abundant. Now, there are other important things to consider: security, reliability, readability, maintainability, etc.

For a modern coder, most of the time, size and speed are not important factors. I would rather use a prototype method with well-named callback that “tells” you exactly what it does than to use some for loop or while loop (even faster if I remember correctly). I would rather have readable code - that makes it safer and more maintainable. There are other reasons why a prototype method is better (not exposing variables and iterators, etc).

When you learn reduce? Do it all in one step. Completely avoid the order question.

Sure, but is it more or less readable? In most cases, you’ve saved less than 1/1000 of a second for the end user and just created more arcane code.

Another version of the Knuth quote:

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.

My point is that we often debate optimization in cases where it has no real effect on the end user and just makes for uglier, harder to maintain, more dangerous code.

Again, make smart choices, but don’t obsess unless you’ve proven that you need to.

1 Like

I generally agree with what you’re saying, but the axiom in HPC is actually that memory is expensive and a massive bottleneck. So a lot of my brainpower goes to moving as little data as possible. But my work is very special purpose.

2 Likes

Yes, there are definite edge cases where these things are really important. If you’re in one of those edge cases, obsess as needed. It’s just that most of us aren’t in one of those edge cases so the obsession would be misplaced, a distraction, and lead to lower quality code.