Clean up d_net.c and push back missed acks to acktosend (Chatbug attempt 3)
Well, they do say "third time's the charm", and honestly, this feels right to me.
When you end up staring at the same code for days, you start to notice a bunch of ways of improving and, and there was quite a bit of code there that was just flat out unused. However, what only struck me this morning is that !2238 (merged) might've just been a partial fix, since Net_SendAcks
only sends acks that is registered in acktosend
, but what I realized is that it might've missed an ack entirely. So, as another attempt, this change actually pushes back old acks in case they're missed, even if they're already queued up.
Here's where this feels right, though: with an unreasonably high amount of packet loss, excess reliable packets aren't being mass-delayed anymore. What I mean is that when spamming Lua commands to try to reproduce the bug, it doesn't delay inputs for many seconds, sometimes up to a minute with enough spam but rather flushes them all at once upon the next successful ack. The result is that it feels more responsive, which is how networking should feel like in my experience.
Even if this attempt fails, though, I will still keep the cleanup to make the the code at least a bit more tidy. For now, I will be hosting on next/2.2.14pre4 the following days with this patch applied while awake to make sure it's fixed - if you want to help out, don't hesitate to join.
Merge request reports
Activity
Ok, so after about 4 hours of testing, it seems like it's less privalent than it used to be, since we only got 2 chatbugs with 2-3 players, including @Lugent. Dissecting the network logs, it looks like
firstacktosend
was reset to 0, which shouldn't be possible since it prevents acks from being sent back.I need to dig further, but this looks promising at least - last session had way more chatbugs over a shorter period of time.
I've studied the code, and I saw it now:
acktosend
wasn't being cleared upon initialization. Because of this, old acks from previous players on that node carried over to new connections, completely throwing off the ack system. This explains why I've struggled so much to reproduce this, since I never even though of the fact of reconnecting to the server to try to reproduce it.It also explains why Lugent chatbugged yesterday: I was doing some testing with Mixtro using a secondary connection beforehand, using up 3 node slots, and none of us chatbugged, so chances are the leftover acks from those connections carried over to Lugent's and Bluelight's, and it once again explains why that was the only chatbug that happened since the server restarted after that, resetting the
acktosend
buffer.I don't think !2238 (merged) is needed anymore either, since I've never chatbugged locally even before it was merged. I'm going to do some testing today too, just to make sure, but this must be it, since it explains all of the weirdness and inconsistency of it all.
Also, to have a faster method of communication, it might be best if we brought the talking to the old #srb2fun IRC channel for now. It's a good option for this since you only need to specify a nick to join it, so there's little to no hurdle to join.
EDIT: KiwiIRC is having issues, so use this temporary channel for now: https://web.libera.chat/##srb2chatbug
Edited by Hanicefuh, what is IRC?
Btw is says unknown error.
Edited by Lugent
After some extensive testing, I think we finally have something that ultimately fixes chatbug. Essentially, the
acktosend
mechanism that caches out-of-order packets for faster response handling isn't strictly necessary, and removing it altogether seems to fix it.The only potential issue is that average latency might increase because the server no longer processes packets that it already has gotten ahead of time, but we didn't notice any change in latency during testing, so this might be a non-issue in practice. Either way, this seems like a functioning fix for it, and it stabilizes netcode a lot, too.
The only issue that we've observed is that the game seems to freeze for about 6 seconds when disconnecting from a server, but at least that's consistent so it should be easy to fix, not to mention that I'll take a 6 second to leave a server bug over a bug that breaks networking any day of the week.
added 1 commit
- ab95b010 - Fix short freeze when disconnecting from a server
added 67 commits
-
ab95b010...7b6bf976 - 62 commits from branch
STJr:next
- 369c327e - Clean up d_net.c and push back missed acks to acktosend
- 14020eda - Clear acktosend when initializing node
- 07738b61 - Remove acktosend
- 3578f173 - Fix short freeze when disconnecting from a server
- 094dfe9f - Fix crash when listing servers in the MS
Toggle commit list-
ab95b010...7b6bf976 - 62 commits from branch
added 3 commits
-
094dfe9f...04c67be8 - 2 commits from branch
STJr:next
- 977ce7b3 - Merge branch SRB2:next into push-back-acktosend
-
094dfe9f...04c67be8 - 2 commits from branch
added Bug Netcode Refactoring labels
mentioned in issue #1395
added 76 commits
-
977ce7b3...8701ef41 - 75 commits from branch
STJr:next
- 499ccdb8 - Merge branch SRB2:next into push-back-acktosend
-
977ce7b3...8701ef41 - 75 commits from branch