Skip to content

CopilotClient.stop() leaks the CLI server's child process tree on Windows (orphaned node/copilot.exe per session) #1804

Description

@tic-top

Summary

On Windows, CopilotClient.stop() does not terminate the CLI server's child process tree — only the top-level launcher process. Every create_session() / stop() cycle therefore orphans a full copilot process tree (node.exe + the copilot.exe broker/worker/webview helpers) that survives until its own idle timeout. In a long-running app that creates one client/session per unit of work (e.g. a batch/eval harness), this is a deterministic ~1 leaked process tree per session, which accumulates and eventually OOMs the host.

Environment

  • github-copilot-sdk (Python) 1.0.0b7 (also visible by inspection in current client.py)
  • Copilot CLI runtime 1.0.66-0
  • Windows 11, Python 3.11 (miniconda)
  • Transport: StdioRuntimeConnection, use_logged_in_user=True

Root cause

CopilotClient.stop() only calls terminate() / kill() on its own launcher Popen (self._process):

# client.py (stop)
if self._process and not self._is_external_server:
    self._process.terminate()
    try:
        self._process.wait(timeout=5)
    except subprocess.TimeoutExpired:
        self._process.kill()
    self._process = None

On Windows, terminating the launcher does not cascade to its descendants. The launcher (copilot.cmdcmd.exe) spawns node.exe, which spawns the real copilot.exe server + helper processes; these are left orphaned and alive after stop().

Reproduction

import asyncio, psutil
from copilot import CopilotClient, StdioRuntimeConnection

CLI = r"C:\Users\<you>\AppData\Roaming\npm\copilot.cmd"

def n_copilot():
    return sum(p.info["name"] == "copilot.exe"
               for p in psutil.process_iter(["name"]))

async def main():
    for i in range(5):
        client = CopilotClient(
            connection=StdioRuntimeConnection(CLI, []),
            use_logged_in_user=True,
        )
        session = await client.create_session()
        await session.send("hello")
        # ...consume events until SessionIdle...
        await session.disconnect()
        await client.stop()
        print(f"iter {i}: copilot.exe alive = {n_copilot()}")

asyncio.run(main())

Observed (Windows): copilot.exe alive grows 1, 2, 3, 4, 5 — one orphaned tree per iteration, none reaped by stop().

Expected

After await client.stop() (for a client that spawned the server), the entire CLI server process tree should be terminated, leaving no orphaned node.exe / copilot.exe processes.

Workaround

Capture the launcher PID after create_session() and kill the whole tree explicitly on teardown — e.g. psutil.Process(pid).children(recursive=True) + kill all (enumerate before killing the parent), or taskkill /F /T /PID on Windows / os.killpg on POSIX.

Suggested fix

Have the SDK own the process-tree lifecycle so stop() reaps descendants:

  • Windows: assign the launcher to a Job Object with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE, so the OS atomically kills the tree when the job/handle closes.
  • POSIX: spawn with start_new_session=True and os.killpg(os.getpgid(pid), SIGKILL) on stop.
  • Or a psutil-based recursive child kill inside stop().

This is likely related to the CLI-side reports of orphaned processes (e.g. github/copilot-cli#1368, #2279) but is reproducible purely through the SDK's own create_session()/stop() lifecycle.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions