Todays micro-optimization: avoid re-doing overflowing multiplications.
Use the full 128-bit result from imul. Not really all that much of an
improvement in most cases, presumably.
The same thing can however also be done for + and -, at a minimum.
In general more mpz operations could be inlined where it makes sense
(the operators, mainly).