Skip to content

WIP: SM8550 deep suspend/resume#2954

Draft
shuuri-labs wants to merge 15 commits into
ROCKNIX:nextfrom
shuuri-labs:sm8550-suspend-resume-wip
Draft

WIP: SM8550 deep suspend/resume#2954
shuuri-labs wants to merge 15 commits into
ROCKNIX:nextfrom
shuuri-labs:sm8550-suspend-resume-wip

Conversation

@shuuri-labs

@shuuri-labs shuuri-labs commented Jul 1, 2026

Copy link
Copy Markdown

Summary

This PR enables the deep suspend path (S2RAM) for SM8550, with testing focused on the Retroid Pocket 6.

The foundation is @jaewun's SM8550 suspend/resume series, cherry-picked with his authorship intact: UFS PM-resume completion servicing and out-of-band relink drain, hibern8 vs clk-gating collisions, IRQ balancing on error paths, ICE crypto clock plumbing, geni UART irq masking, gamepad MCU quiesce/reinit, deep-only suspend, and the rsinput version handshake. This is his tested lineage, not the later thor-suspend-fixes-v1 reroll, which reworks the UFS relink handling and drops the ICE core-clk unwind; I have not built or tested that.

On top I add three commits:

  • broaden the TSENS uplow-wake skip from the AYN Thor match to qcom,sm8550 so it also covers the RP6
  • CONFIG_PM_SLEEP_DEBUG so testers can read /sys/power/pm_wakeup_irq
  • mask the IPCC mailbox irq on suspend, lifted from SM8750 (Odin 3)

The IPCC change is the wake fix for the RP6. The ADSP charger firmware pushes an unsolicited BATTMGR_NOTIFICATION (opcode 0x7) about 0.5s after suspend entry. It rings the IPCC mailbox irq, which is IRQF_NO_SUSPEND upstream so it is never masked and wakes the device every few minutes. Dropping IRQF_NO_SUSPEND masks it across suspend. SM8750 already ships this; SM8550 was just missing it.

s2idle (the freeze / "fake suspend" path) is disabled: mem_sleep_default=deep on the cmdline, and SuspendState pinned to mem so a failed deep attempt no-ops instead of wedging in s2idle_enter(), where the edge-triggered PMIC power-key wake is not re-delivered and the device is left unrecoverable.

Testing

Built and tested on the Retroid Pocket 6.

Ran repeated RTC-woken deep suspend/resume cycles, including longer unattended runs. Checked wake sources via /sys/power/pm_wakeup_irq and /sys/kernel/debug/wakeup_sources, and confirmed real time-in-suspend via CLOCK_BOOTTIME vs CLOCK_MONOTONIC drift.

Test results

Deep suspend resumes successfully on the RP6. After the IPCC mask, suspends stayed down cleanly (verified ~14 minute suspends with no spurious wakes, 1681s total across the runs) where before the device self-woke every few minutes on the opcode 0x7 IPCC push. The gamepad works normally after wake.

One cosmetic leftover: rsinput (serial1-0) logs a -110 resume timeout in suspend_stats last_failed_dev, but the gamepad re-inits and works fine after wake.

Additional Context

Scope is SM8550, not gated to one board, since this is SoC-level work and the kernel patches are guarded by DT compatibles anyway. @jaewun validated the foundation on the AYN Thor; I validated the combined stack on the RP6. One board-specific tuning to flag: uic_cmd_timeout=3000 was tuned to the Thor's post-collapse PHY margin. CONFIG_PM_SLEEP_DEBUG is a diagnostic aid, easy to drop if you would rather keep it out of nightly.

Marked draft/WIP until it gets a clean build run against the current next tip.

AI Usage

Did you use AI tools to help write this code? Yes.

AI tools were used during the investigation and packaging: correlating wakeup-source and suspend-stat logs to pin the wake to the opcode 0x7 IPCC push, comparing candidate patches across the SM8550 and SM8650 trees, and assembling this branch. The IPCC fix itself is lifted verbatim from ROCKNIX's SM8750 patch, and the final selection was validated by hardware testing on the RP6.

jaewun and others added 15 commits June 28, 2026 22:36
Enable working deep (S2RAM) suspend on the AYN Thor and fix the issues
found getting there:

- UFS PM-resume: service UTP/UIC completions inline in hardirq during the
  deep-resume PM phase (0201), and rescue a failing resume relink by
  completing the in-flight DME_LINKSTARTUP UIC command (0202).
- ICE: allow explicit votes on the UFS PHY ICE iface clock (0204) and add
  the iface clock + UFS_PHY_GDSC power-domain to the ICE DT node (0205);
  enable inline-crypto-engine / UFS crypto in the kernel config.
- thermal: leave the Thor tsens uplow IRQ as non-wakeup (0203).
- input: reinit the RSInput gamepad MCU on resume from suspend, which
  otherwise loses state and produces no input until re-init (1004).
- quirks: AYN Thor 030-suspend_mode enables mem suspend and wires
  power/suspend/lid keys to logind (HandleLidSwitch=suspend).
…able_irq during PM

Deep-resume relink hard-wedged in ufshcd_disable_irq()'s synchronous
disable_irq()->synchronize_irq(), waiting on the threaded IRQF_ONESHOT UFS irq
handler that isn't scheduled during deep resume. Root-caused via an SDAM
HCE-step breadcrumb (pinned ufshcd_hba_stop entry). Gate the shared wrapper to
disable_irq_nosync() when pm_op_in_progress, covering BOTH PM-reachable callers
on the relink path — ufshcd_hba_stop() and the qcom ufs_qcom_host_reset()
PRE_CHANGE hook — so the wedge can't move from one to the other (caught in
codex review).
ufs_qcom_host_reset() disables the controller IRQ, but the
reset_control_assert/deassert failure returns left it disabled, leaking
the disable depth. That path only became reachable once ufshcd_disable_irq()
stopped synchronizing during PM resume (prior commit). Route both error
exits through a common re-enable so the IRQ depth stays balanced.
Consolidates the deep-resume relink-rescue into one coherent change (supersedes
the former inline-hardirq, rescue-relink, and no-sync-disable-irq patches).

On this non-MCQ controller the UFS IRQ is threaded IRQF_ONESHOT. When the primary
returns IRQ_WAKE_THREAD the line stays masked until the threaded handler runs; if
that handler is left pending-but-unrunnable across a PM/EH reset (SCHED_FIFO IRQ
thread starved on the single online CPU, and synchronize_irq() would block on it),
the relink's UIC/UTP completions latch in REG_INTERRUPT_STATUS and the relink
hangs -> the device wedges with no watchdog.

Service the relink's completions without depending on the UFS IRQ firing or its
thread being schedulable: drain inline in the ufshcd_intr() PM fast path, and from
an arch-timer poller across both the resume relink and the error-handler
reset_and_restore window (ufshcd_relinking = pm_op_in_progress || relink_poll_active);
mask with disable_irq_nosync while relinking so the reset cannot block on the
pending thread; and force-complete a stalled DME_LINK_STARTUP so link startup can
retry. The dev-init NOP is reaped by the IS-gated drain (IE is full by then).

Validated by RTC deep-suspend soaks (>=120s cold-rail dwell), bare and under a
live gamescope/Steam session.
qcom_ice_resume() enables core_clk then iface_clk; if the iface_clk enable fails
it returned without disabling the core_clk it just enabled. Unwind core_clk on
that error path. Fix on top of the carried "soc: qcom: ice: allow explicit votes
on iface clock" cherry-pick.
Combine the rsinput resume-reinit and suspend-quiesce into one PM-ops change. The
MCU streams over the UART; left running across suspend teardown it storms the
geni RX IRQ and trips the spurious-IRQ disable, leaving the gamepad dead after
resume. Quiesce on suspend, re-init on resume.
The gamepad MCU UART storms its geni irq during suspend, tripping a
spurious-disable that wedges the next suspend. Mask the non-console geni irq over
the PM transition.
ufs_qcom_clk_scale_notify() dropped the return value of the POST_CHANGE
hibern8-exit and always returned 0, hiding a failed link-wake from the
clock-scaling rollback path. Capture and return it.
…bern8)

SW clk-gating hibern8 and HW auto-hibern8 both enabled collide: the redundant SW
DME_HIBERNATE_ENTER on an already-parked link never completes (cmd 0x17 timeout
-> -110 -> link broken). Disable HW auto-hibern8 via the quirk.
…pse PHY

After a cold rail-collapse the marginal M-PHY's hibern8 enter late-completes
(~566ms) and -ETIMEDOUTs against the 500ms default. Persist
ufshcd_core.uic_cmd_timeout=3000 via the kernel cmdline (ufshcd is built-in).
SM8550 deep suspend (S2RAM) is now reliable, so promote suspend from the
per-device AYN Thor quirk to the SM8550 platform quirk and apply it across the
whole SoC (AYN Thor, AYANEO Pocket ACE, AYANEO Pocket S 2K).

The power button routes through systemd-logind -> systemd-sleep, which writes
the SuspendState tokens "mem standby freeze" to /sys/power/state in order,
stopping at the first that succeeds. "mem" maps to deep; if the deep attempt
returns an error (e.g. the in-flight power-key edge aborts it) systemd falls
through to "freeze" (s2idle). On SM8550 s2idle parks forever in s2idle_enter():
the edge-triggered PMIC power-key wake is not re-delivered to break the swait
(lost wakeup), and with no hardware watchdog the device is then unrecoverable.
Deep resumes cleanly.

Fix: pin deep-only. The platform quirk writes a sleep.conf.d drop-in that
resets the SuspendState list (it is parsed as a strv, so drop-ins append) and
pins it to "mem" alone, removing the freeze fall-through. A failed deep then
cleanly no-ops instead of wedging. mem_sleep_default=deep is added to the
cmdline as a secondary pin so "mem" can never itself map to s2idle. The drop-in
and power-key binding are written before suspend is enabled to avoid any boot
window where the fall-through list is briefly active.

This enables suspend on the AYANEO Pocket ACE and Pocket S 2K, which share the
SoC but have not been individually validated for suspend; the deep-only pin is
strictly safer than the previous fall-through and the kernel suspend fixes
already apply SoC-wide.

- platforms/SM8550/030-suspend_mode: enable deep-only suspend (was disabled)
- devices/AYN Thor/030-suspend_mode: removed (now handled by the platform)
- devices/SM8550/options: add mem_sleep_default=deep
Add 1009-input-rsinput-handshake-mcu-version-on-init.patch.

On resume the gamepad MCU is re-powered and rsinput_init_commands() sent the
version request and the framed-report-mode params command after fixed msleep()
delays. The MCU's post-power boot time varies, so on an unlucky resume the
params command landed before the MCU was ready, was lost, and the MCU stayed in
its default free-running stream -> rsinput RX buffer overflows and the gamepad
is dead until a later suspend/resume happens to re-init in a ready window.

Replace the blind delay with a handshake: send the version request and wait for
the MCU's async version reply (delivered via the serdev rx callback) before
sending params, with bounded retries and a timeout. Fixes the intermittent
post-resume gamepad-dead / RX-overflow race.
jaewun's patch scoped the of_machine_is_compatible() check to ayn,thor.
RP6's root compatible is "retroidpocket,rp6", "qcom,qcs8550", "qcom,sm8550"
with no ayn,thor entry, so the fix silently did nothing there. Broaden the
check to the SoC-level compatible so it covers every SM8550 board.
Exposes the last wakeup IRQ source on resume, for diagnosing spurious
or unexpected wakeups.
The ADSP charger firmware pushes an unsolicited BATTMGR_NOTIFICATION
(opcode 0x7) about 0.5s after suspend entry. It rings the IPCC mailbox
irq, which upstream is IRQF_NO_SUSPEND, so it is never masked and wakes
the device every few minutes. Drop IRQF_NO_SUSPEND so the mailbox irq is
masked across system suspend. Lifted from ROCKNIX SM8750 (AYN Odin 3),
which shares the battmgr/pmic_glink charger model and sleeps cleanly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants