Make sure all allocations are properly aligned
After testing with compiling with -fsanitize=undefined
, I found out that the engine is allocating huge amounts of misaligned data. While this is not strictly a problem, it does cause performance issues due to how the CPU bus works regarding memory accessing. Because CPU's are bound to their bus sizes, misaligned accesses can cause reads to require multiple cycles in order to get the full address. This can have a significant impact on performance, which is exactly the case here. After checking with MangoHUD, the average performance improved significantly after fixing the alignment issues:
Noticable differences:
- The CPU usage dropped by 1% (CPU is already low due to my PC being high-spec).
- The max frame time dropped significantly.
- The GPU usage dropped by 2%.
It's not a major improvement, but it's still an improvement. Also, for more info, see section 3.6 in the Intel Software Optimization Guide: https://cdrdv2-public.intel.com/671488/248966-Software-Optimization-Manual-R047.pdf