Skip to content

NPO2 slope span optimization

Hannu Hanhi requested to merge Hannu_Hanhi/SRB2:sw-tilted-npo2-span-opt into next

I don't understand enough of slope drawing to remove the modulo operations in non-power-of-two slopes, so I instead optimized them using libdivide. (https://libdivide.com/) That library (contained in one header file) speeds up division (and modulo) when the same divisor is used multiple times. I also reduced the amount of modulo operations per pixel from 2-4 to always 2. The functions are now 1.5x - 3x faster.

Screenshots of best improvement scenario:

Before:

srb20299

After:

srb20302

Merge request reports