Added yet more 'native' opcodes for AMD64/x86_64 Added inline versions of F_DUP, F_SWAP, F_LOOP and F_LOCAL_2_LOCAL. This almost doubled the speed of the 'Loops Nested (local)' benchmark.