NPO2 slope span optimization

I don't understand enough of slope drawing to remove the modulo operations in non-power-of-two slopes, so I instead optimized them using libdivide. (https://libdivide.com/) That library (contained in one header file) speeds up division (and modulo) when the same divisor is used multiple times. I also reduced the amount of modulo operations per pixel from 2-4 to always 2. The functions are now 1.5x - 3x faster.

Screenshots of best improvement scenario:

Before:

srb20299

After:

srb20302

Merge request reports

Loading