Skip to content
Snippets Groups Projects

Optimize lookups of constants and fields in Lua

Closed Hanicef requested to merge Hanicef/SRB2Classic:optimize-lua-value-lookup into next
2 unresolved threads

After doing some profiling using OProfiler, I found that lookups for constants and fields in Lua scripts performed really badly. This is caused by the lookups being only basic loops that checks each value, something that performs horribly for large tables. However, I noticed that all of these tables are static, which makes binary search a good candidate for optimizing this issue, and after doing some tests, the difference is huge! When playing on script-heavy servers, most lag is gone and there's only some minor lag which I can only guess is caused by suboptimal scripts (still need to investigate that, though).

In case you want more information on how binary search works, feel free to read this: https://hanicef.me/document/binary-search

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added Enhancement Lua labels

    • Contributor

      It would probably be better to just put constants in _G performance-wise and as far as I know there was already an MR about that. However this is good for global variable and userdata.

    • Author Contributor

      Good idea, but question is if that would be a breaking change, since the current code handles prefixes in a special way, so moving it to _G could possibly cause constants to be evaluated in a different order (maybe we risk constants being writable, or something along those lines?).

      Edited by Hanicef
    • Please register or sign in to reply
  • sphere added Optimisation label and removed Enhancement label

    added Optimisation label and removed Enhancement label

  • Contributor

    Constants being writable is a valid concern. I don't see how we could prevents from editing them if they are in _G. Well we could already overwrite them but rawset was needed. If we put them in _G then a standard assign would edit them and break scripts.

  • Would it really though? It's not like scripts attempt to reassign these constants the normal way anyways. And like you said, it's already possible via rawset, so it doesn't really matter how you do it, as long as you can still do it.

  • I personally do not think the writability is a significant issue, for the same reason SMS mentioned.

    However, I am fairly sure there is also a merge request that optimises userdata field access, so it is likely this merge request may be entirely redundant.

  • My bad; it would appear this is actually wrong and there is no merge request open for it, however I am almost certain Hannu Hanhi (which I apparently cannot mention here?) does have such a branch, either on his computer or pushed to his personal repo.

    Edited by LJ Sonic
  • However, I am fairly sure there is also a merge request that optimises userdata field access, so it is likely this merge request may be entirely redundant.

    Sorry if this reply was unclear, but essentially, there is already an existing branch on Hannu Hahni's repository with currently no merge request associated. From memory, it is conceptually the same as yours, but instead of a binary search, it relies on a traditional Lua look-up table to fetch the field's index, which is a more efficient approach.

    The only downside is that according to Hannu, the branch mostly works but is not completely finished either. Hannu is not planning to finish it anytime soon, if ever, so if you are interested, you may consider finishing it and merging the finished version instead of your original branch.

    Edited by LJ Sonic
  • Author Contributor

    I noticed that !1721 (merged) was merged, but after doing some performance tests with this code:

    for i = 0,0xfffff do
    	local v = MT_NULL
    	v = MT_RAY
    	v = MT_SKYBOX
    	v = MT_SPARK
    	v = MT_EXPLODE
    	v = MT_HOOP
    	v = MT_AXIS
    	v = MT_RAIN
    	v = MT_ROSY
    	v = MT_TARGET
    	v = MT_LAVAFALL
    	v = MT_FLAMEJET
    	v = MT_FLAME
    	v = MT_GFZTREE
    	v = MT_LETTER
    	v = MT_ROCKET
    	v = MT_LASER
    	v = MT_BUBBLES
    	v = MT_FAN
    	v = MT_RING
    	v = MT_THOK
    	v = MT_PLAYER
    	v = MT_UNKNOWN
    end

    I got these results on my system:

    !1721 (merged) (merged PR): 10.723304s

    !2016 (closed) (this PR): 1.243832s

    After digging further into why, it's pretty obvious actually: the code for Lua tables with strings for keys are just simple loop iterations: https://git.do.srb2.org/STJr/SRB2/-/blob/next/src/blua/ltable.c#L452-L463

    This is no better than the previous implementation. But implementing binary search for Blua would definitely be something to look into, since I'd reckon we can greatly improve performance on Lua code by doing so, but it would be more work.

    For now, though, this PR seems more efficient on solving the performance problems, so I'd suggest we roll back !1721 (merged) and merge this instead. Oh, and if you want to the performance test yourself, use this patch:

    diff --git a/src/lua_script.c b/src/lua_script.c
    index 9c7636ebe..b0afc6378 100644
    --- a/src/lua_script.c
    +++ b/src/lua_script.c
    @@ -22,6 +22,7 @@
     #include "g_game.h"
     #include "g_input.h"
     #include "f_finale.h"
    +#include "i_system.h"
     #include "byteptr.h"
     #include "p_saveg.h"
     #include "p_local.h"
    @@ -569,6 +570,8 @@ INT32 lua_lumploading = 0;
     static inline void LUA_LoadFile(MYFILE *f, char *name, boolean noresults)
     {
            int errorhandlerindex;
    +       double start = (double)I_GetPreciseTime() / I_GetPrecisePrecision();
    +       double end;
     
            if (!name)
                    name = wadfiles[f->wad]->filename;
    @@ -590,6 +593,9 @@ static inline void LUA_LoadFile(MYFILE *f, char *name, boolean noresults)
            lua_remove(gL, errorhandlerindex);
     
            lua_lumploading--; // turn off again
    +
    +       end = (double)I_GetPreciseTime() / I_GetPrecisePrecision();
    +       CONS_Printf("Loaded Lua script in %lfs\n", end - start);
     }
     
     // Load a script from a lump

    It prints out the amount of time it took for the script to finish loading into the console, which is how I got the results.

  • Contributor

    Using _G being this much slower is odd as it actually use an hashed table and not a simple loop like you say. It's worth having more data to decide.

  • Author Contributor

    Using _G being this much slower is odd as it actually use an hashed table and not a simple loop like you say.

    AFAICT it uses a hashed table for integers but not strings, which can be seen in the function above the one I linked. Or the logic for _G entirely different from other tables? (if so, please link the code so I can examine it)

    Also, I tried adding more MT_* fields to the test and the difference became even larger. I suspect it has to do with the fact that the binary search table is constant-sized since it loads all enums immediately, while _G becomes progressively slower as more values are added to the table.

  • Contributor

    I perhaps don't understand it well but I don't see an iterative loop here but an hash table loop. You may be right and I don't understand the results but that what I think the code you shown is. Can someone who take time to understand the VM see if it's a simple loop or an hashtable loop? When I will have time I will run the test and perhaps try to understand this.

  • Contributor

    Bad new for you.
    I cloned your fork. Compiled master, next and optimize-lua-value-lookup with the same settings and ran your test a few times and the results are closer to what I expected.
    Master: 34.291s.
    Next: 0.516s.
    optimize-lua-value-lookup: 5.706s.

    Edited by Lamibe
  • Author Contributor

    Ah, yeah, forgot to pull since the PR was merged, my bad. Getting the same result as you after a pull and it's faster.

    We can still use this PR to optimize mobjlib, but I think it's better to close this and apply the same strategy to optimize that, too.

  • closed

Please register or sign in to reply
Loading