Branch: Tag:

2014-06-22

2014-06-22 15:56:26 by Per Hedbor <ph@opera.com>

Significantly faster stack handling in many cases

This is done by declaring pike_interpreter to be a const variable.
This is, obviously, not quite true. Different threads have different
values for the variable in question, but it is always the same for the
lifetime of the thread.

These changes make it be const everywhere except in interpret.c and
threads.c. If the variable was moved to threads.c it could probably
be const in interpret.c as well.

This generates fully working code on at least one architecture with
one compiler. ;)

The gain is fairly substantial in code size (it saves about 10% of the
binary size) and speed is faster, since it no longer has to read the
value of **Pike_interpreter every time the stack is used (one step is
skipped, but the stack pointer is still re-read if a function has been
called)

push_int(0), push_int(1), push_int(2) before:

| movq Pike_interpreter_pointer(%rip), %rax
| movq (%rax), %rcx
| leaq 16(%rcx), %rdx
| movq %rdx, (%rax)
| movq $0, 8(%rcx)
| movq $0, (%rcx)
| movq Pike_interpreter_pointer(%rip), %rax
| movq (%rax), %rcx
| leaq 16(%rcx), %rdx
| movq %rdx, (%rax)
| movq $0, (%rcx)
| movq $1, 8(%rcx)
| movq Pike_interpreter_pointer(%rip), %rax
| movq (%rax), %rcx
| leaq 16(%rcx), %rdx
| movq %rdx, (%rax)
| movq $0, (%rcx)
| movq $2, 8(%rcx)

And after:

| movq Pike_interpreter_pointer(%rip), %rax
| movq (%rax), %rcx
| movq $0, (%rcx)
| movq $0, 8(%rcx)
| movq $0, 16(%rcx)
| movq $1, 24(%rcx)
| leaq 48(%rcx), (%rax)
| movq $0, 32(%rcx)
| movq $2, 40(%rcx)

496:    SET_SVAL_TYPE_SUBTYPE(*s_, PIKE_T_INT,NUMBER_UNDEFINED); \    s_->u.integer=0; \    } \ -  Pike_sp=s_; \ +  Pike_sp=(struct svalue*)s_; \   }while(0)      
861:   struct callback;   PMOD_EXPORT extern struct callback_list evaluator_callbacks;    - /* Things to try: -  * we could reduce thread swapping to a pointer operation if -  * we do something like: -  * #define Pike_interpreter (*Pike_interpreter_pointer) -  * -  * Since global variables are usually accessed through indirection -  * anyways, it might not make any speed differance. -  * -  * The above define could also be used to facilitate dynamic loading -  * on Win32.. -  */ - PMOD_EXPORT extern struct Pike_interpreter_struct *Pike_interpreter_pointer; + PMOD_EXPORT extern struct Pike_interpreter_struct * + #ifndef IN_THREAD_SWITCHING + const + #endif + Pike_interpreter_pointer; +    #define Pike_interpreter (*Pike_interpreter_pointer)      #define Pike_sp Pike_interpreter.stack_pointer