Skip to content

Port R_PointOnSide calculation from GZDoom

Hanicef requested to merge Hanicef/SRB2:port-pointonside-zdoom into next

So I've been hammering on performance for 2.2.14 for a bit before, and even though we've got some decent improvements out of it, it has always bothered me that there's still a ton of room for improvement. However, it just struck me today: how have other DOOM engines handled these issues? This isn't really unique to SRB2, the foundations are all the same for all DOOM derivatives, so logically, some of the optimizations on other engines should be applicable on SRB2, too.

The engine that I immediately turned myself towards was GZDoom, since it's by far the most mature and well-developed DOOM engine that exists. So, I started digging, and this is what I found:

//==========================================================================
//
// R_PointOnSide
//
// Traverse BSP (sub) tree, check point against partition plane.
// Returns side 0 (front/on) or 1 (back).
//
// [RH] inlined, stripped down, and made more precise
//
//==========================================================================

inline constexpr int R_PointOnSide (fixed_t x, fixed_t y, const node_t *node)
{
        return DMulScale (y-node->y, node->dx, node->x-x, node->dy, 32) > 0;
}

They had one-lined this function. And not only does it look fast, it is fast, too. After porting the code to SRB2, I did some quick benchmarking and got these results:

Before:

clipboard

After:

clipboard

NOTE: these measurements were done on a debug build, since my build environment is a bit of a mess and it was easiest for me, so the results might differ in practice.

On top of that, I also did another minor optimization regarding object placement. In the code, we do this a lot: P_CheckPosition(mobj, mobj->x, mobj->y);. This is a bit problematic performance-wise, since inside that function, we do newsubsec = R_PointInSubsector(x, y); which means we traverse the BSP tree, but in this case, we really don't need to since mobj->subsector already has the subsector we're looking for on those coordinates. I haven't done any measurements on how this impacts performance, though, but it ought to improve it a bit.

Merge request reports

Loading