[performance] Slightly smaller low_mega_apply. There is now only one instance of the inclusion of apply_low.h (does anyone else feel a need for less lowness around here?) which actually made it faster (previously there were two cases, one for scoped functions calles and one for calls without scope). Also created a new version of mega_apply (named lower_mega_apply) that can only do APPLY_LOW, and only the most common cases. It falls back to using the old low_mega_apply when needed. Added a lot of lowness to places to utilize the optimization. This actually saves surprising amounts of CPU in code calling a lot of small functions, lower_mega_apply() is about 2x faster than low_mega_apply(APPLY_LOW,...) when it does not hit one of the cases it does not support (trampolines, calling non function constants, variables or arrays). There is unfortunately now even more code duplication around, but since the code is slightly different that is rather hard to avoid.