Branch: Tag:

2014-06-19

2014-06-19 15:55:43 by Per Hedbor <ph@opera.com>

Faster svalue type/subtype setting

The code generated for setting two shorts (on x86/x86_64 and arm at least) is
very sub-optimal, especially with gcc, for some reason:

Old push_int(0):
movq Pike_interpreter_pointer(%rip), %rdx
movq (%rdx), %rax
leaq 16(%rax), %rcx
movq %rcx, (%rdx)
xorl %edx, %edx
xorl %ecx, %ecx
movw %dx, (%rax)
movw %cx, 2(%rax)
movq $0, 8(%rax)

New push_int(0):
movq Pike_interpreter_pointer(%rip), %rdx
movq (%rdx), %rax
leaq 16(%rax), %rcx
movq %rcx, (%rdx)
movq $0, (%rax)
movq $0, 8(%rax)

Except for the lower number of instructions there is an additional
benefit: The old code triggered a read-modify-write operation on most
modern x86 CPU:s, all to preserve the undefined data between subtype
and the value of the svalue. This could be fixed by changing the type
and subtype to be 32-bit instead of 16-bit, but that is a bigger
change.

2432:      TEST_BUILTIN_VOID(__builtin_unreachable,[])   TEST_BUILTIN_VOID(__builtin_expect,[1,1]) + TEST_BUILTIN(__builtin_memset, [&foo,0,0])   TEST_BUILTIN(__builtin_clz, 23)   TEST_BUILTIN(__builtin_clzl, 23)   TEST_BUILTIN(__builtin_clzll, 23)