Efficiency of chaining multiple map commands
Map is one of my favourite coding tools. It takes a list and an “a –> b” function, and returns the list with each value replaced by the function's output. Or to put it graphically:
map f(x) [1, 2, 3, 4, 5] = [f(1), f(2), f(3), f(4), f(5)]
One habit I find myself doing is chaining multiple map commands together. I've always wondered how efficient that is, and have decided to put it to the test. I started with the hypothesis “Repeatedly calling map with simple functions is slower than calling it once with a complex function”, and wrote a small program in Racket.
(define random-numbers
(for/list ((x (make-list 100000 0))) (random 101)))
This generates 100,000 random numbers between 1 and 100 inclusive. It's probably not the best way to do it, but that's not the part being graded here. I then wrote the following two functions:
(define (ctof1 lst)
(map (lambda (x) (+ 32 x))
(map (lambda (x) (/ x 5))
(map (lambda (x) (* 9 x)) lst))))
(define (ctof2 lst)
(map (lambda (x) (+ 32 (/ (* 9 x) 5))) lst))
Both of these functions do the same thing; They convert temperatures from Celsius to Fahrenheit. Multiply by nine, divide by 5, add 32. The first calls map 3 times, one for each step. The second calls it once, and does all 3 steps at the same time.
(define c1 (time (ctof1 random-numbers)))
(define c2 (time (ctof2 random-numbers)))
The time procedure runs whatever is inside it, and prints out runtime stats. The stats aren't used as part of the definition, but they come up in the terminal for me to compare myself. To make sure this works correctly, I added this:
(if (equal? c1 c2)
"Both give the same result"
"Both give different results (This shouldn't happen)")
Every test resulted in both c1 and c2 being the same, which makes sense since they do the same thing. The times came out differently however. On a Thinkpad T420 running Ubuntu 20.04, with DrRacket 7.6, these were the CPU times after 5 runs:
3 maps, simple functions: 63 60 682 23 39
1 map, complex function: 60 40 47 18 19
The 682 was caused by the garbage collector kicking in. I can't explain why the numbers get smaller with every run. Maybe DrRacket has some sort of caching built in.
For now the conclusion seems to be that running map once on a set of data with a complex function is more efficient than running it multiple times with simple ones. It makes sense, since the less you run map the less you have to actually calculate, and the less have to traverse lists. I'll probably try this again with something like Rust in the future.