Faster string_to_utf8() This commit splits the length calculation of the resulting string and the actual encoding into two new functions. This makes it possible to e.g. encode utf8 directly into a buffer. The length calculation has been rewritten for different shift sizes. For 8bit strings it uses a popcount loop, which calculates the number of high bits (code points bigger than 0x7f) on machine size chunks. On machines which have popcount instructions this is much faster. With compilers which do not support __builtin_popcount it uses a simple manual popcount. For 16bit and 32bit strings the length calculation uses clz to count the number of bits in the codepoint to calculate the length without branches. The encoding function is split into one version for each shift size. For 32bit strings it avoids branches by using the resulting byte lengths as a jump size. This generates reasonable code, at least in gcc. Benchmark results on my i7: utf8/code.pike#encode_7bit | 1.3 G 1.6 % | 8.3 G 3.4 % | utf8/code.pike#encode_8bit | 651.1 M 1.8 % | 1.1 G 1.2 % | utf8/code.pike#encode_arabic | 498.4 M 0.8 % | 710.3 M 1.2 % | utf8/code.pike#encode_bulgarian | 488.2 M 1.2 % | 688.4 M 2.6 % | utf8/code.pike#encode_estonian | 614.8 M 6.6 % | 969.5 M 1.5 % | utf8/code.pike#encode_hebrew | 496.9 M 1.8 % | 710.1 M 1.0 % | utf8/code.pike#encode_japanese | 704.9 M 4.0 % | 785.4 M 1.6 % | utf8/code.pike#encode_polish | 388.9 M 0.4 % | 710.1 M 1.3 % | utf8/code.pike#encode_thai | 642.8 M 3.3 % | 858.0 M 0.9 % | utf8/code.pike#encode_yiddish | 485.9 M 3.3 % | 692.5 M 3.8 % | I also tested on arm32, the speedups are around 50%.