Optimize lookups of constants and fields in Lua
After doing some profiling using OProfiler, I found that lookups for constants and fields in Lua scripts performed really badly. This is caused by the lookups being only basic loops that checks each value, something that performs horribly for large tables. However, I noticed that all of these tables are static, which makes binary search a good candidate for optimizing this issue, and after doing some tests, the difference is huge! When playing on script-heavy servers, most lag is gone and there's only some minor lag which I can only guess is caused by suboptimal scripts (still need to investigate that, though).
In case you want more information on how binary search works, feel free to read this: https://hanicef.me/document/binary-search
Merge request reports
Activity
added Enhancement Lua labels
Good idea, but question is if that would be a breaking change, since the current code handles prefixes in a special way, so moving it to
_G
could possibly cause constants to be evaluated in a different order (maybe we risk constants being writable, or something along those lines?).Edited by Hanicef
added Optimisation label and removed Enhancement label
My bad; it would appear this is actually wrong and there is no merge request open for it, however I am almost certain Hannu Hanhi (which I apparently cannot mention here?) does have such a branch, either on his computer or pushed to his personal repo.
Edited by LJ SonicIf constants being writable isn't a problem perhaps compare with !1721 (merged) to see if this is redundant or not.
Looking through !1721 (merged), it's not entirely redundant: this PR also optimizes field lookups in mobjs in Lua, which !1721 (merged) doesn't. So if !1721 (merged) is OK, we can always merge it and remove enum optimizations from this PR and keep the mobj optimizations.
However, I am fairly sure there is also a merge request that optimises userdata field access, so it is likely this merge request may be entirely redundant.
Sorry if this reply was unclear, but essentially, there is already an existing branch on Hannu Hahni's repository with currently no merge request associated. From memory, it is conceptually the same as yours, but instead of a binary search, it relies on a traditional Lua look-up table to fetch the field's index, which is a more efficient approach.
The only downside is that according to Hannu, the branch mostly works but is not completely finished either. Hannu is not planning to finish it anytime soon, if ever, so if you are interested, you may consider finishing it and merging the finished version instead of your original branch.
Edited by LJ SonicI noticed that !1721 (merged) was merged, but after doing some performance tests with this code:
for i = 0,0xfffff do local v = MT_NULL v = MT_RAY v = MT_SKYBOX v = MT_SPARK v = MT_EXPLODE v = MT_HOOP v = MT_AXIS v = MT_RAIN v = MT_ROSY v = MT_TARGET v = MT_LAVAFALL v = MT_FLAMEJET v = MT_FLAME v = MT_GFZTREE v = MT_LETTER v = MT_ROCKET v = MT_LASER v = MT_BUBBLES v = MT_FAN v = MT_RING v = MT_THOK v = MT_PLAYER v = MT_UNKNOWN end
I got these results on my system:
!1721 (merged) (merged PR): 10.723304s
!2016 (closed) (this PR): 1.243832s
After digging further into why, it's pretty obvious actually: the code for Lua tables with strings for keys are just simple loop iterations: https://git.do.srb2.org/STJr/SRB2/-/blob/next/src/blua/ltable.c#L452-L463
This is no better than the previous implementation. But implementing binary search for Blua would definitely be something to look into, since I'd reckon we can greatly improve performance on Lua code by doing so, but it would be more work.
For now, though, this PR seems more efficient on solving the performance problems, so I'd suggest we roll back !1721 (merged) and merge this instead. Oh, and if you want to the performance test yourself, use this patch:
diff --git a/src/lua_script.c b/src/lua_script.c index 9c7636ebe..b0afc6378 100644 --- a/src/lua_script.c +++ b/src/lua_script.c @@ -22,6 +22,7 @@ #include "g_game.h" #include "g_input.h" #include "f_finale.h" +#include "i_system.h" #include "byteptr.h" #include "p_saveg.h" #include "p_local.h" @@ -569,6 +570,8 @@ INT32 lua_lumploading = 0; static inline void LUA_LoadFile(MYFILE *f, char *name, boolean noresults) { int errorhandlerindex; + double start = (double)I_GetPreciseTime() / I_GetPrecisePrecision(); + double end; if (!name) name = wadfiles[f->wad]->filename; @@ -590,6 +593,9 @@ static inline void LUA_LoadFile(MYFILE *f, char *name, boolean noresults) lua_remove(gL, errorhandlerindex); lua_lumploading--; // turn off again + + end = (double)I_GetPreciseTime() / I_GetPrecisePrecision(); + CONS_Printf("Loaded Lua script in %lfs\n", end - start); } // Load a script from a lump
It prints out the amount of time it took for the script to finish loading into the console, which is how I got the results.
Using _G being this much slower is odd as it actually use an hashed table and not a simple loop like you say.
AFAICT it uses a hashed table for integers but not strings, which can be seen in the function above the one I linked. Or the logic for
_G
entirely different from other tables? (if so, please link the code so I can examine it)Also, I tried adding more
MT_*
fields to the test and the difference became even larger. I suspect it has to do with the fact that the binary search table is constant-sized since it loads all enums immediately, while_G
becomes progressively slower as more values are added to the table.I perhaps don't understand it well but I don't see an iterative loop here but an hash table loop. You may be right and I don't understand the results but that what I think the code you shown is. Can someone who take time to understand the VM see if it's a simple loop or an hashtable loop? When I will have time I will run the test and perhaps try to understand this.
Bad new for you.
I cloned your fork. Compiled master, next and optimize-lua-value-lookup with the same settings and ran your test a few times and the results are closer to what I expected.
Master: 34.291s.
Next: 0.516s.
optimize-lua-value-lookup: 5.706s.Edited by Lamibe