I'm honestly surprised how this has gone unnoticed for this long. Turns out software has had an optimization in it's BSP tree traversal that avoids unnecessary recursion for quite a while, but it was never implemented in OpenGL, so I'm doing it now. I haven't done any performance tests, though, since I assume that whoever made the optimization in the software renderer already has.