Compiler [amd64] [arm32] [arm64]: Inline F_DUP and F_SWAP with arg1 != 0. Inlines most cases of F_DUP and F_SWAP on amd64, arm32 and arm64. NB: Not complete. F_SWAP with arg1 != 0 is NOT inlined on arm32.
Compiler: Parameterized the opcodes F_DUP and F_SWAP. They now take an offset from the old implicit argument. An offset of 0 gives the old operation. These are useful for eg duplicating the lvalue at the top of the stack: F_DUP(1) F_DUP(1) Previously this required much more complex code: F_DUP F_REARRANGE(1, 2) F_SWAP F_DUP F_REARRANGE(1, 3) F_SWAP
Work in progress: Sakura master
Compiler: Add optional validation that %rsp is properly aligned. Add new generic configure option `--with-experimental` intended to be used to enable various experimental and/or debug code. Exactly what `--with-experimental` enables is NOT intended to be stable, and may be changed at any time. It may also have no effect at all.
Compiler [amd64]: Keep stack alignment before calling C code GCC 8 started to emit movaps instructions with (%RSP) as destination, leading to GPF in case it was not properly aligned. Backport of the remainder of the fix since it is now relevant.
Add --with-mc-stack-frames configure option. (Currently X86-64 only.) This will enable frame pointers in machine code, thereby allowing e.g. Linux perf to unwind the stack and get proper stack traces including Pike functions.
Inline the F_CATCH opcode (on AMD64 so far). This is a prerequisite for MACHINE_CODE_STACK_FRAMES, since inter_return_opcode_F_CATCH will "inject" itself on the C stack when the first F_CATCH opcode is encountered (and won't vanish until inter return, which may occur in an outer Pike frame).
Merge branch 'patches/amd64-broken-debug-F_XOR_INT' into 8.0 * patches/amd64-broken-debug-F_XOR_INT: Compiler [amd64]: Fix indexing out of bounds for F_XOR_INT --with-debug.
Merge branch 'patches/amd64-broken-debug-F_XOR_INT' * patches/amd64-broken-debug-F_XOR_INT: Compiler [amd64]: Fix indexing out of bounds for F_XOR_INT --with-debug.
Compiler [amd64]: Fix indexing out of bounds for F_XOR_INT --with-debug.
Disassembler [amd64]: Added a few more opcodes.
Compiler [amd64]: Inline F_SWAP_STACK_LOCAL.
Compiler and runtime: Added byte codes F_PUSH_CATCHES and F_CATCH_AT. These are needed to be able to save and restore the recovery context for generator functions. Updates the code generators for quite a few machine code backends.
Compiler: Fixed typo in PIKE_DEBUG code.
Runtime: Increase the maximum number of bytecode opcodes to 512. Adds F_INSTR_PREFIX_256. We were very close to the opcode limit...
Runtime: Support F_MARK_AT in generators.
Compiler [WIP]: Experimental implementation of support for generators. NB: The code is known to be broken, and the approach is likely to be changed. WORK IN PROGRESS! DO NOT USE! Do NOT merge into any branches that are likely to be merged into main line!
Build [x86_64]: Fixed typo in previous commit.
Build [x86_64]: Fixed call of string_builder_append_disassembly(). Fall out from the recent API change.
Merge remote-tracking branch 'origin/master' into new_utf8
Merge commit '722771973bd' into patches/lyslyskom22891031 * commit '722771973bd': (6177 commits) Verify that callablep responses are aligned with reality. ...
Compiler: Silence compiler warnings GCC 8 got more picky about function pointer signatures, but there is really no need to let those warnings bother us.
Merge commit '2470270f500c728d10b8895314d8d8b07016e37b' into grubba/typechecker-automap * commit '2470270f500c728d10b8895314d8d8b07016e37b': (18681 commits) Removed the old typechecker. ...
Merge remote-tracking branch 'origin/8.1' into gobject-introspection
Compiler [amd64]: Keep stack alignment before calling C code GCC 8 started to emit movaps instructions with (%RSP) as destination, leading to GPF in case it was not properly aligned.
Build [amd64]: Fixed warning.
Interpreter: fixed handling of SAVE_LOCALS bitmask Since the introduction of save_locals_bitmask, expendible_offset was never set. Also since the handling of expendible_offset and save_locals_bitmask were handled by the same case, the code was broken. During pop entries handling of the save_locals bitmask could lead to situations where locals above expendible_offset were 'copied' into the trampoline frame. Those locals could have already been popped from the stack by the RETURN_LOCAL opcode. Also slightly refactored the code to not allocate more space for locals than needed and removed some unnecessary casts. This became visible and could lead to crashes when building for 32bit on 64bit x86 machines.
Merge commit '75c9d1806f1a69ca21c27a2c2fe1b4a6ea38e77e' into patches/pike63 * commit '75c9d1806f1a69ca21c27a2c2fe1b4a6ea38e77e': (19587 commits) ...
Fix spelling of FALLTHRU directive The non-standard spelling "FALL_THROUGH" is not recognized by gcc 7.3. Also, the comment must not contain any other text, or be placed inside braces.
Interpreter: merge low_return variants
Merge branch 'grubba/rename_lfun_destroy' into 8.1 * grubba/rename_lfun_destroy: Modules: Fixed logts of warnings. Testsuite: Updated for LFUN::_destruct(). Compiler: Don't complain about LFUN::destroy() in compat mode. Fix multiple warnings. Runtime: LFUN::destroy() has been renamed to _destruct(). Compiler: Rename LFUN::destroy() to LFUN::_destruct().
Compiler: Rename LFUN::destroy() to LFUN::_destruct(). As decided at Pike Conference 2017.
X86-64: Check C stack margin before adding stub stack frames.
Merge branch 'marty/call_frames' into 8.1 This introduces the --with-mc-stack-frames configure option, which will instruct the machine code generator to insert proper stack frames (currently only supported on X86-64). This is useful for profiling, especially in combination with Debug.generate_perf_map() on Linux.
Compiler: do not modify instrs array
unsigned INT64 -> UINT64
Compiler: Moved yyreport() et al to pike_compiler.cmod. More code cleanup.
Compiler [amd64]: Disassembler now supports narrow registers.
Compiler [amd64]: Use gdb-style disassembly syntax. Fixes multiple disassembly syntax and argument ordering issues.
Compiler [amd64]: Use string_builder_append_disassembly(). Improved formatting of disassembler output by using the new function.
Compiler [amd64]: Fixed disassembler table typo.
Compiler [amd64]: Fixed some invalid constants in the disassembler.
Compiler [amd64]: Fixed some disassembler lookup table typos.
Compiler [amd64]: Added some more sub-opcodes to the disassembler.
Compiler [amd64]: Support multi-byte opcodes in disassembler.
Compiler [amd64]: Support jmp instructions in disassembler. Also fixes argument order for some instructions.
Compiler [amd64]: More disassembler opcodes.
Compiler [amd64]: Multiple fixes of the disassembler.
Compiler [amd64]: First go at implementing DISASSEMBLE_CODE(). Work in progress; disassembly of SIB not yet implemented, argument order is not correct in some cases, but it does disassemble some opcodes correctly.
AMD64: use Pike_fp->current_storage
AMD64: reimplement F_PROTECT_STACK
Interpreter: simplify struct pike_frame - expendibles and save_sp are usually only used during setup and frame deallocation. It is enough to store them as offsets from 'locals' - reordered the struct entries to avoid some padding
Silence warning
Compiler: reimplement F_PROTECT_STACK
Interpreter: store save_sp and expandibles as offsets
Unused version of mov_imm_mem16 for fast calls branch.
F_LOOP: Use <= 0, not == 0
Removed Intel IA64 compiler specific DO_NOT_WARN.
[amd64] Fixed compilation of F_CONSTANT in decoded programs. Note that the code used for decode_value decoded programs is significantly slower than the one that is generated for programs compiled from pike code. This is sort of unfortunate since most modules are dumped.
Bypass the constant table, push value directly
[amd64] Bypass the string table and push the string directly
Todays micro-optimization: avoid re-doing overflowing multiplications. Use the full 128-bit result from imul. Not really all that much of an improvement in most cases, presumably. The same thing can however also be done for + and -, at a minimum. In general more mpz operations could be inlined where it makes sense (the operators, mainly).
Compiler [amd64]: reload sp_reg after call into c code The stack pointer needs to be reloaded after calling F_LOOP. Otherwise, since the F_LOOP opcode function changes the stack pointer, it might be overwritten with the wrong value before calling a subsequent opcode function.
Removed trailing spaces.
Normalized file ends.
Compiler [amd64]: Added a few fall-though markers. Fixes [CID 1294640].
Compiler [amd64]: Fix bug in F_FOREACH. The initial foreach counter may be set to non-zero when foreach goes over a ranged array. If the initial foreach counter is larger than the size of the array F_FOREACH started indexing outside the array. Fixes [bug 7426 (#7426)]. FIXME: Is there a corresponding problem with negative ranges?
Minimal optimization of mov_imm_reg
Compiler [amd64]: Fixed F_*CALL_BUILTIN* --with-debug. The use of ins_debug_instr_prologue() zapped ARG1_REG for at least F_MARK_CALL_BUILTIN.
Compiler [amd64]: Fixed code generator for INC/DEC. Fixes [bug 7384 (#7384)].
Added F_UNDEFINEDP as an opcode Also added a convenience SVAL(X) macro that returns the offset to different parts of the svalue X values from the base register. Also added some usage of the SVAL function in a few places. Chagned to 32-bit arithmetics for types in a few locations. It saves on code size, if nothing else (no REX prefix unless R8.. is used)
Checking using tst_reg32 is not optimal in branch_if_non_zero.
Fixed F_ADD, also disabled the int+int optimization in it. Most cases are already caught by the ADD_INTS opcode. At least if people type their code. Switched a few cmp(reg,PIKE_T_INT) to test(reg).
Fixed a few missing debug prologues.
Automatically convert cmp_reg[32](reg,0) to test_reg(reg).
Added INC,ADD, DEC and SUBTRACT These are inlined when both arguments are integers. It would be fairly to do for floats as well. DIV/MOD/AND/MULT etc would be fairly easy to add as well, but most require some more decoding of the intel instruction reference manual.
Disable the most problematic opcodes.
Added a few more global variable opcodes. Gotta catch em all! This time: PRIVATE_IF_DIRECT_GLOBAL and ASSIGN_PRIVATE_IF_DIRECT_GLOBAL These will fetch or assign a global variable if the currently executing program is the program the object is cloned from. These are only slightly slower than the F_PRIVATE_GLOBAL family of opcodes, and the overhead if the global is not actually private is minimal. Missing: [ASSIGN_]PRIVATE_IF_DIRECT_TYPED_GLOBAL[_AND_POP] and ASSIGN_PRIVATE_IF_DIRECT_GLOBAL_AND_POP.
Compiler [amd64]: Fixed a few typos. The code generator for F_ASSIGN_PRIVATE_TYPED_GLOBAL_AND_POP was broken (generated code for F_ASSIGN_PRIVATE_TYPED_GLOBAL) due to a cut-n-paste miss. This caused some code in Roxen to fail. Fixes [LysLysKOM 20929878].
Removed CLEAR_2_LOCAL & CLEAR_4_LOCAL, added CLEAR_N_LOCAL This simplifies things a bit, and reduces codesize at times. The record I have seen while running the testsuite was a clear_n_local(23).
Added lvalue version of lexical_local
Added F_LEXICAL_LOCAL amd64 edition
Added F_CLEAR_4_LOCALS opcode. I see a need for a CLEAR_N_LOCALS one instead.
Added F_ASSIGN_PRIVATE_TYPED_GLOBAL[_AND_POP]. This completes the suite of private global opcodes.
Added machinecode verison of F_LOCAL_LOCAL_INDEX Specifically, this optimizes array[int]. Also includes incomplete f_branch_if_type_is_not.
Added F_PRIVATE_TYPED_GLOBAL. Much like PRIVATE_GLOBAL, but handles typed svalues (everything but int, function and object). No assign yet.
Comment and whitespace changes
Some more static on functions Also added convenience functions to check if an svalue is a reference type.
Read somewhat fewer bytes Mainly, this saves four bytes of code size for each branch_when_{eq,ne}.
Revert "Keep pike_fp->current_storage up to date in pike functions." This reverts commit 9129e401d0db1703a938794d2d61d73b4b214992.
Fixed "hilfe arrow up" crash. The code crashed when assigning a private global variable that was either an integer or float with bit 8 set.
Do not allow assignment of private variables in destructed objects.
Keep pike_fp->current_storage up to date in pike functions. This speeds up global variable accesses quite a lot.
Verify ENTRY_PROLOGUE_SIZE size.
Revert "Changed fast_call_threads_etc handling with valgrind" This reverts commit 1c4cf54199bd51903bc071a5aceff11e40c00222. Needs more work, currently it is causing crashes.
Generate more compact code for int+int.
Removed some #if 0:ed code Fixes a warning when compiling with debug.
More compact type checks, no need to do a cmp, & is enough now.
Optimized access to private/final global variables Especially the machine code version is now significantly faster, it will simply read the variable directly from the known byte offset instead of calling a function that resolves it in the vtable. Gives about a 20x speedup of trivial code along the lines of globala = globala + globalb; Also tried to disable some of the optimizations that causes lvalues to be generated instead of the desired global/assign_global opcodes. For now this is only done if the global variabeles are known to not be arrays, multiset, strings, mapping or objects, since those optimizations are needed to quickly append things to arrays (and mappings/multiset, but that is less common. It is also needed for destructive modifications of strings, something that is even less common).
Save a few bytes of code size for each free_svalue 8-bit constants generates smaller code.
Changed fast_call_threads_etc handling with valgrind Instead of disabling it entirely, clear it at function entry. This gets rid of the uninitialized value, and slows things down less than not doing the optimization.
Hide the REG_<X> macros/enums. It is just too easy to accidentally write REG_RBX instead of P_REG_RBX. This causes rather hard to find bugs in the generated code.
add_mem8_imm is used when not compiling with valgrind. Re-introduced the function
Added F_CALL_BUILTIN_N and F_APPLY_N. This calls the constant in arg1 with arg2 arguments from the stack. These opcodes are used if the number of arguments is known and bigger than 1. It is not really all that big an optimization, it only removes the mark stack handling. And, in fact, due to the fact that it removes some peep optimizations it might be somewhat slower when not using the amd64 machine code (since, as an example, APPLY/ASSIGN_LOCAL/POP is no longer an opcode that is used in this case). However, when using the amd64 code the assign local + pop opcode is higly optimized, so it's not an issue that it is not merged into the apply opcode. It is in fact more of a feature. For that reason the code in docode.c is currently conditional. The only code generator using it is the amd64 one.
Runtime: Unified struct svalue and struct fast_svalue. Modern gcc (4.7.3) had aliasing problems with the two structs, which caused changes performed with SET_SVAL() (which used struct fast_svalue) to not be reflected in TYPEOF() (which used struct svalue). This in turn caused eg casts of integers to floats to fail with "Cast failed, wanted float, got int". The above problem is now solved by having an actual union for the type fields in struct svalue. This has the additional benefit of forcing all code to use the svalue macros. NB: This code change will cause problems with compilers that don't support union initializers.
Hide unused opcodes.
[amd64] Fixed one more case broken by the svalue renumbering. Fixes [LysLysKOM 20484693]/[Pike mailinglist 13687]. Thanks to Chris Angelico <rosuav@gmail.com> for the report an test case.
Now compiles on modern Ubuntu version. Newer versions of linux has defines and enums defining REG_x, where X is all amd64 registers, but they are not numbered in a logical manner. Fixed by renaming REG_X to P_REG_X in our file.
Reshuffle labels to avoid "Branch 130 too far" message seen after ba7d5e1fb6e8.
[amd64] Reorder the arguments to cmp_reg_reg(). cmp_reg_reg() now compares its arguments in the same order as the cmp_reg*_imm() variants. Fixes F_POP_TO_MARK. Probably fixes index overruns in F_INDEX and F_LOCAL_INDEX.
[runtime][amd64] Fixed some free_svalue-related bugs. free_svalues() now survives freeing unfinished arrays. amd64_free_svalue() now supports freeing PIKE_T_VOID svalues.
[amd64] Some constant folding in F_POS_INT_INDEX.
Runtime: Renumbered PIKE_T_*. Breaks ppc32 and ppc64. Renumber the low PIKE_T_* values so that PIKE_T_INT becomes zero. This has the feature that zeroed memory becomes filled with Pike svalues containing integer zeroes (and not NULL pointer arrays). This will let call_c_initializers() avoid traversing the entire identifier table for the class. Note: The serialized representation of types (__parse_pike_type()) is unchanged. As is the {out,in}put for {en,de}code_value(). Updates the code generators for ia32 and amd64. Breaks the code generators for ppc32 and ppc64.
[amd64] Fully inline RETURN. Inline LOCAL_2_GLOBAL
Valgrind friendly machine code
Compiler (amd64): Fixed bug in F_POS_INT_INDEX. The range check in F_POS_INT_INDEX used the wrong comparison opcode which caused indexing of arrays with their size to be allowed. Added some corresponding tests to the testsuite. Thanks to Stewa for the report.
Move label to keep jump distance below limit.
Compiler (amd64): Add some missing type checking to F_FOREACH. The first argument to F_FOREACH wasn't verified to be an array, which would cause core dumps if it wasn't. Fixes [Pike-mailing-list 13472]/[LysLysKOM 20109625].
Wrap unused parameters in UNUSED(), and debug-only parameters in DEBUGUSED(), to cut down on compiler warnings. The macro also renames parameters to catch accidental use. (There are more places to clean up but I don't want to modify code that isn't compiling on my machine.)
Have division by zero constant be compilation error on all platforms and not just amd64.
Fixed branch
Fixed error in F_SIZEOF_LOCAL_STRING when the argument is not actually a string
Added F_SIZEOF_STRING and F_SIZEOF_LOCAL_STRING We really should pass on the type to the code generator instead, I think. There should also be a "#pragma promise_correct_types" or something that would guarantee that the types are correct, and crash and burn if they are not. The generated code would be significantly smaller and faster.
Added F_NOT amd64 edition.
Added a few more amd64 opcodes, the comparisons. This adds support for F_EQ, F_NE, F_LE, F_GE, G_LT and F_GT. They could be better, it would be simple to add floats, as an example, especially in F_EQ and F_NE (they /almost/ work for floats, nan complicates things however).
Merge remote-tracking branch 'origin/8.0' into string_alloc Conflicts: src/stralloc.c
Silence warnings.
For now -- Give up entirely doing / and % with negative integers x86 is round-to-0, and pike expects round-to-negative-infinity. This can be fixed in the opcodes, but it is somewhat harder than simply ignoring it for now. :)
Fixed some issues with +/- int and re-enabled F_MOD_INT.
Compiler [amd64]: Fixed code generator for F_RSH_INT. The wrong label was used.
Added a few more x86-64 opcodes F_XOR / F_XOR_INT F_DIVIDE_INT / F_MOD_INT and a partial implementation of F_POP_N_ELEMS.
Added F_DIVIDE
Added x86_64 support for F_MOD The divide instruciton also does mod. But for negative numbers pike is somewhat different. So, be rather restrictive.
Unified some code. Added F_MULTIPLY. Fixed an issue in F_LSH_INT.
Added F_COMPL. Fixed F_AND_INT
Fixed F_LSH_INT.
Added x86-64 version of R_LHS_INT
Fixed non-int f_negate case
Added a few more opcodes F_NEGATE F_LSH F_AND_INT F_OR_INT F_RSH_INT F_SUBTRACT_INT
Reset the result types, clearing zero_type
Slightly shorter F_RSH It is uncommon enough to do >> with values >= 64 to use the C version.
Added F_AND, F_OR and F_RSH opcodes
Merge remote-tracking branch 'origin/8.0' into string_alloc
Merge branch '8.0' into gobject-introspection
Merge remote-tracking branch 'origin/7.9' into pdf
Merge remote-tracking branch 'origin/7.9' into ba Conflicts: src/interpret.c src/interpret.h src/pike_embed.c
Casting to INT64 first is correct here. This reverts commit 806cc2fd28f3315d8aedf8325f8b85139439023c.
F_NEG_NUMBER: Don't cast to INT64 before negation. Solves a sign-extension issue when used with 0x80000000 as argument, though it's debatable whether this value should ever occur in the first place.
Merge branch '7.9' into gobject-introspection
[compiler][amd64] Can't easily compare functions and floats in F_BRANCH_WHEN_{EQ,NEQ}. Also corrected some comment typos.
Compiler (amd64): Changed calling convention for {jmp,call}_rel_imm*(). They now take the absolute address as argument, since the relative offset depends on the size of the generated opcode, which could vary. This fixes tlib/modules/Calendar.pmod/testsuite:433, where the second F_RETURN_IF_TRUE jumped five bytes too short into the code generated by the first.
Merge branch '7.9' into block_alloc Conflicts: src/modules/system/configure.in src/post_modules/CritBit/tree_low.c src/post_modules/CritBit/tree_low.h src/post_modules/CritBit/tree_source.H
[compiler][amd64] Use new features from peep.c Place the code for calling check_threads_etc before the function instead of inside it, to have one branch less in tight loops. This saves about 4% in the nested loops test, at the cost of 12 bytes extra code-space for functions that do not actually contain loops (for functions that contain loops 3 bytes is saved intead) One alternative would have to place the code after the function, if it does contain a loop, then update the relative jumps to point to the code. That is left for later.
Compiler (amd64): Load fp_reg even more consistently. It should be loaded even without PIKE_DEBUG...
Compiler (amd64): Load fp_reg more consistently. Background: Some of the opcode implementations use the C-implementation as a fallback for the more complex cases. These typically use amd64_call_c_opcode(), which calls maybe_update_pc(), which may call UPDATE_PC(), which calls amd64_load_fp_reg(), which loads fp_reg if it isn't thought to be loaded. Problem: This means that the opcodes in question sometimes will enter with fp_reg not loaded, and exit with fp_reg thought to be loaded even though it isn't loaded on all code-paths for the opcode. Solution: This patch loads fp_reg in the instruction prologue under the same circumstances where maybe_update_pc() would have loaded it.
Merge remote branch 'origin/7.9' into block_alloc
[compiler][amd64] Attempt at faster PC updates. Always just assign PC to the current address instead of adding the difference. This is somewhat faster. We still do too many updates, though. As an example it is commont to have 2-3 update_pc in a row.
[compiler][amd64] Added instr_prologue in a lot of places. Also, load sp register more consistently, and only when actually needed.
[compiler][amd64] Inline version of FOREACH This is the old foreach( array, loop_variable ). About 10 times faster if the loop variable is a local variable in the function.
[compiler][amd64] Yet more inlined opcodes Added inline versions of INDEX, INT_NDEX, NEG_INT_INDEX, LOCAL_INDEX. They are only inlined when the index is an integer and the item to be indexed is an array. Adding support for string[int] might be useful.
[compiler][amd64] Inline some more opcodes Added SIZEOF, RETURN_LOCAL and CLEAR_2_LOCAL.
[compiler][amd64] More inline opcodes. Added the various *CALL*BULTIN* opcodes.
[compiler][amd64] Inline a few mode opcodes Added inline versions of LTOSVAL2_AND_FREE, LTOSVAL, ASSIGN and ASSIGN_AND_POP. Slightly optimized BRANCH_WHEN_*ZERO and BRANCH_WHEN_*LOCAL.
[amd64] Fixed the comparison opcodes for real. Also, use 32-bit comparisons when possible, this saves one byte in the generated code per comparison since rex is not needed.
[amd64] Fixed fallback version of BRANCH_WHEN_ZERO.
[compiler][amd64] Real mov16 and mov8 added. Using the movzx instruction, this is for unsigned numbers. Versions using movsz is needed if signed numbers are to be used. Inlined a few more opcodes. Fixed branch when (non) zero and branch when local to correctly treat 0.0 as non-zero. Fixed clearing of zero type in ADD_LOCAL_INT[_and_pop] and ADD_[NEG_]INT.
Compiler (amd64): Improved detection of use of stale registers.
Compiler (amd64): Fixed bug where F_FILL_STACK pushed one arg too many.
Merge remote branch 'origin/7.9' into block_alloc Conflicts: lib/modules/Tools.pmod/Shoot.pmod/module.pmod
[compiler][amd64] Cleaned up code somewhat and faster branches Added FAST_BRANCH_WHEN{_,_NOT_}ZERO that knows that sp[-1] is an integer. It can thus avoid doing any checking of types and the normal pop_stack checks. Also inlined the normal BRANCH_WHEN{_,_NON_}ZERO. There is now a common function that is used to generate modrm+sib+offset for the *mem* family of functions. Also removed frame init/stack cleaning for Functions that just return a constant.
Compiler (amd64): Inline some more opcodes.
Compiler (amd64): Make sure the fp_reg is loaded before use.
[compiler] Significantly faster simple loops New opcodes: ASSIGN_LOCAL_NUMBER_AND_POP ADD_LOCAL_NUMBER_AND_POP, ADD_LOCAL_LOCAL_AND_POP and ASSIGN_GLOBAL_NUMBER_AND_POP The rationale for the assign_local variants is that it is significantly faster to do local=local and local+=[number||local] than it is to do local&, number, f_add_to and similar. The reason being that the locals act much like registers, they are easy to assign values from the machinecode level. Also added some perhaps dubious optimizations of the code that the treeoptimizer produce for for-loops. The result of the above is that the NestedLoops* tests are about eight times faster. And runs entirely in native code, without any function calls.
Compiler (amd64): Fixed code generator for shifts.
Compiler (amd64): Fixed a few typos. PIKE_INT_TYPE is signed...
Compiler (amd64): Fixed a few warnings.
Compiler (amd64): Added inlineing of a few more opcodes.
Compiler (amd64): Fixed PC-calculation.
Merge remote branch 'origin/7.9' into rblock_alloc Conflicts: src/post_modules/CritBit/floattree.cmod src/post_modules/CritBit/inttree.cmod src/post_modules/CritBit/stringtree.cmod
[compiler][amd64] Some more optimizations and changes Added branch_check_threads_etc calls that went missing. Also changed how branch_check_threads_etc is called, the code now maintains a counter on the C-stack, if adding 1 to it (as a signed byte) causes it to overflow the C-function is called, after adding 128 to the in-memory counter. This saves rather a lot of calls. Inlined F_{DUMB_,}RETURN, F_BRANCH_WHEN_{EQ,NE} F_ADD_NEG_INT, F_ADD_INT and F_ADD_INTS.
Compiler (amd64): Inline F_POP_TO_MARK and F_FILL_STACK.
Compiler (amd64): low_mov_mem_reg() now supports REG_R12...
Compiler (amd64): Support using labels for backward jumps too.
Compiler (amd64): Minor optimization of F_LOCAL_2_LOCAL.
Compiler (amd64): Fixed typo in add_reg_imm_reg() This typo broke F_LOCAL_2_LOCAL (and probably others).
Compiler (amd64): Fixed a few ins_debug_instr_prologue() calls.
Compiler (amd64): Fixed typo in F_SWAP.
Added yet more 'native' opcodes for AMD64/x86_64 Added inline versions of F_DUP, F_SWAP, F_LOOP and F_LOCAL_2_LOCAL. This almost doubled the speed of the 'Loops Nested (local)' benchmark.
Got rid of some C++-style comments.
Rewrote x86_64/AMD64 native code generation. De-macrofied the generator to make it at least somewhat easier to read, also attempted to make the opcode implementations easier to understand (at least if you have a copy of the Intel Software Developer's Manual in front of you. There are known issues with some opcodes, mot all mov_* work with all registers as arguments (due to the x86 instruction encoding for register&7==[3,4]) Added a few more instructions, and changed some occurences of things like 'mov $4, eax; mov eax,[ecx+off]' to 'mov $4,[ecx+off]'. Added a simple branch/label system to make it somewhat easier to write more complex opcodes. Inlined or partially inlined some opcodes: THIS_OBJECT, ASSIGN_LOCAL, ASSIGN_LOCAL_AND_POP, ASSIGN_GLOBAL, ASSIGN_GLOBAL_AND_POP, POP_VALUE, SIZEOF_LOCAL, CONSTANT, GLOBAL_LVALUE, LOCAL_LVALUE, BRANCH_IF[_NOT]_LOCAL and fixed INIT_FRAME. Changed handling of 'check_threads_etc', it is now only called every 1024 branches (or, like normal, when functions are called). Overall pure pike-code execution is about 20% faster.
Compiler (amd64): Fixed typo.
Compiler (amd64): Disabled inlineing of F_INIT_FRAME. Needs either 16-bit or 32-bit store.
Compiler (amd64): Inline some of the new opcodes.
Interpreter mega patch: The global Pike_interpreter struct replaced with Pike_interpreter_pointer.
Compiler (amd64): Inline a few more opcodes.
Compiler (amd64): Improved robustness for PC-relative addressing. Should now support MacOS X.
Compiler (amd64): Compensate for larger CALL_ABSOLUTE() on MacOS X.
Compiler (amd64): Some fixes and potential support for MacOS X.
Compiler (amd64): Changed calling conventions for inter_return_opcode_F_CATCH().
Compiler (amd64): Check stack alignment in debug mode.
Compiler (amd64): Several bugfixes in the code-generator.
Compiler (amd64): Use LEA (%rip) to get the program pointer. Also some fixes of AMD64_MOVE_REG_TO_RELADDR().
Compiler (amd64): Use MOV instead of LEA to save a byte.
Compiler: Inline some common opcodes in the amd64 generator.
Compiler: Removed broken remnant of old code for amd64 machine-code.
Compiler: Support for machine-code for amd64 (aka x86_64) now seems to work.
Compiler: Implemented partial machine-code support for amd64 (aka x86_64).
Pike opcode arguments are signed (cf struct p_instr_s). Rev: src/code/amd64.c:1.2 Rev: src/code/bytecode.c:1.8 Rev: src/code/computedgoto.c:1.5 Rev: src/code/ia32.c:1.46 Rev: src/code/ia32.h:1.30 Rev: src/code/ppc32.c:1.41 Rev: src/code/sparc.c:1.48 Rev: src/pikecode.h:1.14
Source file for amd64 code generation. Rev: src/code/amd64.c:1.1