Here's a patch, tested minimally by running (par-for-each (lambda (x) (monitor (sleep 1) (display "foo\n"))) (iota 10)) on a quad-core. Previously it would print the "foo"s in groups of four with a second between each group; now it prints them one by one with a second between each, as should be.