[performance] Fixed the local+local and some other opcodes
They now use destructive operations when possible.
Also added an inline version of string+string to the local+=local
[performance] Unroll the crc32si, and only xor once.
This more than doubled the hashing speed, but makes even more
assumptions about how the function is called.
[performance] Some tweaks to stralloc to improve performance
Increased the hash-size significantly.
It now aims for one strings per bucket instead of 4.
Changed to only have one short_string block allocator. The wide short
strings are simply fewer characters long now.
Also, do not re-order the chains in findstring.
Update the flags in realloc_shared_string so the code does not have to
be duplicated in the two places the function is used.
Changed the switch to if/else in low_set_index.
This made that function about 3x faster, at least when setting indices
in narrow strings (the case that is now first, and was previously last
in the if/else gcc generated).