[performance] Slightly smaller low_mega_apply.
There is now only one instance of the inclusion of apply_low.h (does
anyone else feel a need for less lowness around here?) which actually
made it faster (previously there were two cases, one for scoped
functions calles and one for calls without scope).
Also created a new version of mega_apply (named lower_mega_apply) that
can only do APPLY_LOW, and only the most common cases.
It falls back to using the old low_mega_apply when needed.
Added a lot of lowness to places to utilize the optimization.
This actually saves surprising amounts of CPU in code calling a lot of
small functions, lower_mega_apply() is about 2x faster than
low_mega_apply(APPLY_LOW,...) when it does not hit one of the cases it
does not support (trampolines, calling non function constants,
variables or arrays).
There is unfortunately now even more code duplication around, but
since the code is slightly different that is rather hard to avoid.