Skip to content

FIX: prevent hanging on server shutdown#546

Open
ianhi wants to merge 4 commits into
jupyterlab:mainfrom
ianhi:fix/clean-shutdown-rooms
Open

FIX: prevent hanging on server shutdown#546
ianhi wants to merge 4 commits into
jupyterlab:mainfrom
ianhi:fix/clean-shutdown-rooms

Conversation

@ianhi

@ianhi ianhi commented Feb 12, 2026

Copy link
Copy Markdown

fixes #161 supersedes #514


🤖 full disclosure this a PR that I used claude for nearly in its entirety. I have reviewed every line to the best of my ability and confirmed that the fix works for me locally. I did multiple local rounds of review, tightening up exceptions, logic and improving comments. But ultimately I learned more about async through this PR than I knew at the start, so very possible that there is a subtlety i missed.

To debug I ended up designing a loop for claude where it used jupyter-collab-mcp to interact with the notebooks (creating rooms crucial for causing hte bug) and then sent sigterms from a shell with PYTHONFAULTHANDLER=1 and looped through possible fixes before getting it working. I then guiding claude through coming up with a minimal fix.


There were three issues preventing shutdown on receiving ctrl-C

1 client serve loop never finishing

the server was stopped. but client rooms were never stopped. We needed to sent a stop message to the clients to cause their loops to stop

# results in corrupted data.
# room_tasks = list()
# for name, room in list(self.rooms.items()):
# for task in room.background_tasks:
# task.cancel() # FIXME should be upstreamed
# room_tasks.append(task)
# if room_tasks:
# _, pending = await asyncio.wait(room_tasks, timeout=3)
# if pending:
# msg = f"{len(pending)} room task(s) are pending."
# self.log.warning(msg)

2 Rooms never being stopped

rooms were persisted after a client disconnect (I think for allowing reconnect). But on shutdown they were never being stopped, so the task was staying alive. now we explcitly stop all rooms.

This option means they weren't being autoclosed on disconnect (which I think is good)

rooms_ready=False,
auto_clean_rooms=False,

3 the root task was never awaited

because we didn't hold on to a reference to the server start task there was no way to await it and ultimately trigger the worker threads were never cleaned up. now we save a reference and await it at the end of clean

await ensure_async(super().prepare())
if not self._websocket_server.started.is_set():
self.create_task(self._websocket_server.start())

@github-actions

Copy link
Copy Markdown
Contributor

Binder 👈 Launch a Binder on branch ianhi/jupyter-collaboration/fix%2Fclean-shutdown-rooms

for _, room in list(self.rooms.items()):
for client in list(room.clients):
tasks.extend(client._background_tasks) # type: ignore[attr-defined]
client._message_queue.put_nowait(b"") # type: ignore[attr-defined]

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice to have teh client have a disconnect method that just handles this. looks rather mysterious from here.

@ianhi

ianhi commented Feb 12, 2026

Copy link
Copy Markdown
Author

hmm even though this works more than before it still doesn't work all of the time. :'( draft for now

@ianhi ianhi marked this pull request as draft February 12, 2026 19:55
@krassowski krassowski added the bug Something isn't working label Feb 13, 2026
@ianhi ianhi force-pushed the fix/clean-shutdown-rooms branch from eb0b575 to d0ad3a4 Compare February 16, 2026 16:13
On SIGTERM, non-daemon anyio WorkerThread instances blocked Python's
threading._shutdown() because the root asyncio task was never awaited,
so its done-callbacks (which send shutdown signals to worker threads)
never fired.

Fix:
- Track the root asyncio task (server.start()) so clean() can await it
- Stop all rooms explicitly during shutdown (auto_clean_rooms=False
  means pycrdt never stops them automatically)
- Await the root task with a timeout so anyio's task group __aexit__
  runs and worker threads get their shutdown signal
- Cancel timed-out tasks in stop_extension() instead of ignoring them

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ianhi

ianhi commented Feb 16, 2026

Copy link
Copy Markdown
Author

Found the final issue with the help of claude and PYTHONFAULTHANDLER=1

the main serve task was never being awaited so there were some workthreads hanging around preventing shutdown. updated description incoming. but i think this works now!

@ianhi ianhi marked this pull request as ready for review February 16, 2026 16:55
@ianhi

ianhi commented Feb 16, 2026

Copy link
Copy Markdown
Author

not sure who to ask for a review here. but I can confirm that this fixes it for me locally. Not really sure why the windows 3.14t test failed :/ it doesn't seem obviously related to these changes?

@davidbrochart davidbrochart left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why structured concurrency is helpful. Jupyverse's RTC implementation is much simpler and doesn't have this bug.

Comment on lines +86 to +89
task = asyncio.create_task(self._websocket_server.start())
self._background_tasks.add(task)
task.add_done_callback(self._background_tasks.discard)
self._websocket_server._start_task = task

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have self.create_task() return the task:

Suggested change
task = asyncio.create_task(self._websocket_server.start())
self._background_tasks.add(task)
task.add_done_callback(self._background_tasks.discard)
self._websocket_server._start_task = task
self._websocket_server._start_task = self.create_task(self._websocket_server.start())


# Now disconnect existing clients so their serve() tasks complete.
tasks: list[asyncio.Task] = []
for _, room in list(self.rooms.items()):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for _, room in list(self.rooms.items()):
for room in list(self.rooms.values()):

# Rooms persist after client disconnect so users can reconnect without
# a full re-sync (see auto_clean_rooms=False in app.py). On shutdown
# we must stop them explicitly.
for _, room in list(self.rooms.items()):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for _, room in list(self.rooms.items()):
for room in list(self.rooms.values()):

@ianhi

ianhi commented Feb 17, 2026

Copy link
Copy Markdown
Author

Thank you for the review! I cleaned up based on your suggestions. Two follow ups:

  1. I'm down to use systems that work better! But how would jupyverse have fixed this, per the docs it still would have used this library, and I thought the issues were internal? (This may be showing my async noviceness)

  2. Im noticing now that I get clean shutdowns with multiple notebooks, but not when I have a jupyter terminal open. any ideas on that?

@krassowski

Copy link
Copy Markdown
Member

jupyverse does not use jupyter_server_ydoc and instead reimplements it. It still depends on other packages from jupyter-collaboration (like the ones providing the UI).

@krassowski

Copy link
Copy Markdown
Member

@ianhi do you think this is worth merging and releasing or is (2) a blocker?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Jupyter Collaboration prevents JupyterLab from being killed from the terminal

3 participants