- pike/src/apply_low.h (+5/-10)(15 lines)
- pike/src/interpret.c (+185/-15)(200 lines)
- pike/src/interpret.h (+6/-2)(8 lines)
- pike/src/interpret_functions.h (+102/-22)(124 lines)
- pike/src/pike_macros.h (+12/-0)(12 lines)
[performance] Slightly smaller low_mega_apply.
There is now only one instance of the inclusion of apply_low.h (does
anyone else feel a need for less lowness around here?) which actually
made it faster (previously there were two cases, one for scoped
functions calles and one for calls without scope).
Also created a new version of mega_apply (named lower_mega_apply) that
can only do APPLY_LOW, and only the most common cases.
It falls back to using the old low_mega_apply when needed.
Added a lot of lowness to places to utilize the optimization.
This actually saves surprising amounts of CPU in code calling a lot of
small functions, lower_mega_apply() is about 2x faster than
low_mega_apply(APPLY_LOW,...) when it does not hit one of the cases it
does not support (trampolines, calling non function constants,
variables or arrays).
There is unfortunately now even more code duplication around, but
since the code is slightly different that is rather hard to avoid.
[performance] Speed up the low_return function noticeably.
This code reordering/redundant test removal makes the function about 10% faster.
[performance] Use the hashtable more when indexing objects
Now it is used even if there is only one identifier in the object.
That helps more than it should, really
[performance] When setting object variables, only check svalue_is_zero if needed
This speeds up assignment of a lot of object variable types,
svalue_is_zero is (comparatively) expensive.
- pike/src/interpret.c (+167/-63)(230 lines)
- pike/src/interpret.h (+10/-3)(13 lines)
- pike/src/pike_embed.c (+3/-2)(5 lines)
[performance] Do not use block-alloc for pike_frame and catch_context
They are too important for code execution speed.
struct pike_frames are allocated in chunks but not free:d until the
program exists. This is basically just like the normal stack, and for
all but the most extreme of recursive programs this is not really an
issue. And for those programs the only loss now is that we are not
returning the frame memory to the system, we are actually using less
memory at peak.
The catch_context structures (that are fairly large, anyway, 80 bytes
on my machine) are simply allocated using malloc, and up to 100 free
ones are kept in a list for quick use.
[performance] Significantly faster is_lt and svalue_is_true
The is_lt function now uses no stack at all, which speeds it up about
a factor of ten (for the case where both arguments are integers).
Much the same was done for svalue_is_true.
Also, the order of the tests were rearrenged to get some other
Interrestingly enough is_lt is actually faster even for the complex
cases now, for whatever reason gcc seems to generate better code.