Extracted some loop-invariants in replace_many(). It's now ~82% as fast as the serial case in the benchmark (it used to be ~53%). Rev: src/builtin_functions.c:1.613