[compiler] Significantly faster simple loops
The rationale for the assign_local variants is that it is
significantly faster to do local=local and local+=[number||local] than
it is to do local&, number, f_add_to and similar.
The reason being that the locals act much like registers, they are
easy to assign values from the machinecode level.
Also added some perhaps dubious optimizations of the code that the
treeoptimizer produce for for-loops.
The result of the above is that the NestedLoops* tests are about eight
times faster. And runs entirely in native code, without any function