Skip to content

Fix dedicated node takeover

Ashnal requested to merge bird/Kart-Public:fix-dedi-node-takeover into master

This should fix the dedicated node takeover issue that completely breaks servers and requires them to be restarted. The bug happens when two separate clients (nodes) join on the same exact tic, which triggers a miscounting of connected players, as the first player to join in the tic is not properly counted until the next tic. This results in the second not being refused (as they should have been) and proceeding into code that does not guard against adding a player with no room, causing a wrap and replacement of the dedicated node, completely breaking everything.

This commit changes the player accounting to be based on the playernode array, as that is updated mid-tic when multiple joins occur, while D_NumPlayers is not updated until the next tic.

I have tested this both on my own, as well as with a popular public server community that encounters this bug semi-regularly. On my own I ran with a dedicated server, 3 clients of 4 players, a client with 2 players, and a client with 1 player. Then I would start up about 12 more clients and have them initiate the wait to join screen, which sends periodic join requests. Then have one of the single clients leave to create a slot, and watch for a simultaneous join from the others. While doing so I did not encounter the bug. However, I also tried the same on an install without this fix and did not trigger it either. What I did conclude was that the change did not break anything.

From there, I enlisted the help of *Mel#4172 over at Mocha's community. They compiled this code into 1.3 and hosted a server with it. Yesterday, they ran this server along with a stream to increase visibility and popularity of the server. During this stream, I also performed the same actions above, running about 16 clients on my end and cycling joins into the server while the stream was going and others were playing in that server. While the stream only had low viewership, there were at least a few other clients trying to join in addition to mine, so I could guestimate anywhere between 17-24 clients trying to connect during the 30 minute period that I was attempting to trigger the bug. You can watch starting here at the link timestamp https://youtu.be/L4o76LWXcOM?t=3407 . As a note, I started out pretty slow, but later on I started cycling clients quickly once I got a rhythm going a bit less than 10 minutes in.

Prior to this and after, Mel has been running the patch on their servers (6 of them) with no issues. At the very least I can confirm the patch does not break anything. And while it is quite difficult to prove a negative, I believe this patch fixes the issue, and will not create any further issues if merged.

Also, PoV you're me testing this: unknown__1_

Edited by Ashnal

Merge request reports