Add flow for re-sending replay frames that spectator did not receive at end of play#38163
Add flow for re-sending replay frames that spectator did not receive at end of play#38163bdach wants to merge 1 commit into
Conversation
…at end of play Compatibility matrix: | | old client | new client | | :--------: | :--------: | :--------: | | old server | 🟠[^1] | 🟠[^1] | | new server | 🟠[^2] | 🟢[^3] | [^1]: Operations will succeed, but the server never attempts to retrieve any dropped frames because it is not aware of the new flow, so the resulting replay will be incomplete. [^2]: Operations will succeed, but the server will not attempt to use the new flow to retrieve any dropped frames, because the client will not send `LastFrameBundleSequenceNumber`, so the resulting replay will be incomplete. [^3]: Operations will succeed. The server will attempt to retrieve dropped frames in a once-off operation at the end of gameplay. This implements the proposal first outlined in ppy/osu-server-spectator#244 (comment). With this change, if a player's connection drops out during a play but then recovers before the end of the play, the server will re-request any frame bundles that it has not received thus far. The client caches all frame bundles it sends out until the end of the play and the request for any missing frame bundles from the server. The frame bundles for past plays are purged only when the server invokes `CompleteReplay()`. (If the server determines that it has received all frame bundles, it will still invoke `CompleteReplay()`, but with an empty list of bundle sequence numbers.) I expect this caching strategy to be controversial, so I am listening to counterproposals (LRU / limited capacity queue? something else?)
| /// <param name="FrameBundleSequenceNumbers">The sequence numbers of frame bundles that the server never received.</param> | ||
| [Serializable] | ||
| [MessagePackObject] | ||
| public record CompleteReplayRequest( |
There was a problem hiding this comment.
One thing I got in the habit of doing with bancho/stable is prefixing these kinds of calls with Client or Server, because I guarantee we are going to end up with Request objects being fired in both directions sooner or later.
Thoughts?
ServerCompleteReplayRequest
ClientCompleteReplayResponse
There was a problem hiding this comment.
Other than the fact that no other operation between client and spectator server uses this convention, I do not have any particular opinions on this proposal.
There was a problem hiding this comment.
Until now we haven't had any classes with Request / Response pairing, so I'm not sure it would suit there. Specifically important for when Request/Response is involved IMHO, because until now the client is always the Requester when using these terms (aka web requests).
But we could potentially prefix other classes where relevant.
| /// Used to determine ordering of frame bundles, and for server-side checks that server received all frame bundles it was supposed to. | ||
| /// </summary> | ||
| [Key(2)] | ||
| public long? SequenceNumber { get; set; } |
There was a problem hiding this comment.
We already have ReceivedTime in FrameHeader, which ended up never being used, a bit weird.
Anyway, any reason this is in the bundle and not the header? I'm not sure what the differentiation we have as to what goes in either, but curious if you have reasoning for putting it here specifically.
There was a problem hiding this comment.
We already have
ReceivedTimeinFrameHeader, which ended up never being used, a bit weird.
No idea what this is, as best I can tell it's completely dead.
Anyway, any reason this is in the bundle and not the header? I'm not sure what the differentiation we have as to what goes in either, but curious if you have reasoning for putting it here specifically.
There is no reason. Putting it on the bundle makes the server side a touch less painful because it removes one level of property scraping but that's not a real reason.
| private long currentFrameBundleSequenceNumber; | ||
|
|
||
| private readonly Queue<FrameDataBundle> pendingFrameBundles = new Queue<FrameDataBundle>(); | ||
| private readonly Dictionary<long, List<FrameDataBundle>> allFrameBundles = new Dictionary<long, List<FrameDataBundle>>(); |
There was a problem hiding this comment.
Wonder how big this will grow during a multi-hour marathon beatmap 🤔. Might be worth calculating on paper.
There was a problem hiding this comment.
Messages can be up to 32 KB in size by default. Not yet sure how this translates to actual usage, will investigate.
There was a problem hiding this comment.
Back of napkin estimations are not great.
frameBundle: 459 B
maxMessageSize: 32 * 1024 B
bundlesPerSecond: 5
maxMessageSize / (frameBundle * bundlesPerSecond) = 14,28 [seconds]
Listening to suggestions what to do about it.
316 of those 459 bytes are the actual replay frames, so even dropping the frame headers does not help here much.
There was a problem hiding this comment.
I'd chunk the responses client side to a sane number, for sure.
But also I wasn't even thinking about max message size (so good you were I guess?). Was more about client side memory usage for very long maps. But 128 kb per minute sounds not too bad.
There was a problem hiding this comment.
I'd chunk the responses client side to a sane number, for sure.
I don't understand what this means. Please elaborate.
Half of the appeal of this solution was that this was a single-shot request that doesn't need retrying. If suddenly I have to split this single-shot request into however many, each of which can fail (what happens when any one of them does?), this is going to get very complex very quick.
I'd sooner entertain solutions like having the client send the .osr across or similar. Maybe that at least can fit into a reasonably small size.
|
|
||
| public Task<CompleteReplayResponse> CompleteReplay(CompleteReplayRequest completeReplayRequest) | ||
| { | ||
| if (!allFrameBundles.Remove(completeReplayRequest.ScoreTokenId, out var frameBundlesForScoreToken)) |
There was a problem hiding this comment.
You touched on this in the OP, but I fear that this is not enough in terms of cleanup logic.
If a user reconnects after a long time being disconnected and the server has forgotten about the user's pending replay data, it's going to remain in the client dictionary until restart, correct?
There was a problem hiding this comment.
If a user reconnects after a long time being disconnected and the server has forgotten about the user's pending replay data, it's going to remain in the client dictionary until restart, correct?
Correct. I have no opinion on how to handle that as choosing an expiry mechanism is highly subjective.
|
To summarise conversation that happened elsewhere regarding this PR today:
Blocking for now due to all of the above. @peppy please confirm that I haven't misrepresented anything. |
RFC. Lightly tested full-stack with 2 PCs to simulate real drop-offs, but probably still needs some further testing.
Compatibility matrix:
This implements the proposal first outlined in ppy/osu-server-spectator#244 (comment).
With this change, if a player's connection drops out during a play but then recovers before the end of the play, the server will re-request any frame bundles that it has not received thus far.
The client caches all frame bundles it sends out until the end of the play and the request for any missing frame bundles from the server. The frame bundles for past plays are purged only when the server invokes
CompleteReplay(). (If the server determines that it has received all frame bundles, it will still invokeCompleteReplay(), but with an empty list of bundle sequence numbers.)I expect this caching strategy to be controversial, so I am listening to counterproposals (LRU / limited capacity queue? timed expiry? something else?)
Footnotes
Operations will succeed, but the server never attempts to retrieve any dropped frames because it is not aware of the new flow, so the resulting replay will be incomplete. ↩ ↩2
Operations will succeed, but the server will not attempt to use the new flow to retrieve any dropped frames, because the client will not send
LastFrameBundleSequenceNumber, so the resulting replay will be incomplete. ↩Operations will succeed. The server will attempt to retrieve dropped frames in a once-off operation at the end of gameplay. ↩