Use -fvisibility=hidden. This allows gcc to optimize function calls to globally visible, but not exported (using PMOD_EXPORT) functions to a direct call. Saves a few % CPU, and also decreases the size of the binary. However, it is now important to use PMOD_EXPORT correctly on any system, not only Windows. It also significantly speeds up loading of dynamic modules, but that does not generally speaking use any significant amount of CPU anyway.