WIP: SM8550 deep suspend/resume#2954
Draft
shuuri-labs wants to merge 15 commits into
Draft
Conversation
Enable working deep (S2RAM) suspend on the AYN Thor and fix the issues found getting there: - UFS PM-resume: service UTP/UIC completions inline in hardirq during the deep-resume PM phase (0201), and rescue a failing resume relink by completing the in-flight DME_LINKSTARTUP UIC command (0202). - ICE: allow explicit votes on the UFS PHY ICE iface clock (0204) and add the iface clock + UFS_PHY_GDSC power-domain to the ICE DT node (0205); enable inline-crypto-engine / UFS crypto in the kernel config. - thermal: leave the Thor tsens uplow IRQ as non-wakeup (0203). - input: reinit the RSInput gamepad MCU on resume from suspend, which otherwise loses state and produces no input until re-init (1004). - quirks: AYN Thor 030-suspend_mode enables mem suspend and wires power/suspend/lid keys to logind (HandleLidSwitch=suspend).
…able_irq during PM Deep-resume relink hard-wedged in ufshcd_disable_irq()'s synchronous disable_irq()->synchronize_irq(), waiting on the threaded IRQF_ONESHOT UFS irq handler that isn't scheduled during deep resume. Root-caused via an SDAM HCE-step breadcrumb (pinned ufshcd_hba_stop entry). Gate the shared wrapper to disable_irq_nosync() when pm_op_in_progress, covering BOTH PM-reachable callers on the relink path — ufshcd_hba_stop() and the qcom ufs_qcom_host_reset() PRE_CHANGE hook — so the wedge can't move from one to the other (caught in codex review).
ufs_qcom_host_reset() disables the controller IRQ, but the reset_control_assert/deassert failure returns left it disabled, leaking the disable depth. That path only became reachable once ufshcd_disable_irq() stopped synchronizing during PM resume (prior commit). Route both error exits through a common re-enable so the IRQ depth stays balanced.
Consolidates the deep-resume relink-rescue into one coherent change (supersedes the former inline-hardirq, rescue-relink, and no-sync-disable-irq patches). On this non-MCQ controller the UFS IRQ is threaded IRQF_ONESHOT. When the primary returns IRQ_WAKE_THREAD the line stays masked until the threaded handler runs; if that handler is left pending-but-unrunnable across a PM/EH reset (SCHED_FIFO IRQ thread starved on the single online CPU, and synchronize_irq() would block on it), the relink's UIC/UTP completions latch in REG_INTERRUPT_STATUS and the relink hangs -> the device wedges with no watchdog. Service the relink's completions without depending on the UFS IRQ firing or its thread being schedulable: drain inline in the ufshcd_intr() PM fast path, and from an arch-timer poller across both the resume relink and the error-handler reset_and_restore window (ufshcd_relinking = pm_op_in_progress || relink_poll_active); mask with disable_irq_nosync while relinking so the reset cannot block on the pending thread; and force-complete a stalled DME_LINK_STARTUP so link startup can retry. The dev-init NOP is reaped by the IS-gated drain (IE is full by then). Validated by RTC deep-suspend soaks (>=120s cold-rail dwell), bare and under a live gamescope/Steam session.
qcom_ice_resume() enables core_clk then iface_clk; if the iface_clk enable fails it returned without disabling the core_clk it just enabled. Unwind core_clk on that error path. Fix on top of the carried "soc: qcom: ice: allow explicit votes on iface clock" cherry-pick.
Combine the rsinput resume-reinit and suspend-quiesce into one PM-ops change. The MCU streams over the UART; left running across suspend teardown it storms the geni RX IRQ and trips the spurious-IRQ disable, leaving the gamepad dead after resume. Quiesce on suspend, re-init on resume.
The gamepad MCU UART storms its geni irq during suspend, tripping a spurious-disable that wedges the next suspend. Mask the non-console geni irq over the PM transition.
ufs_qcom_clk_scale_notify() dropped the return value of the POST_CHANGE hibern8-exit and always returned 0, hiding a failed link-wake from the clock-scaling rollback path. Capture and return it.
…bern8) SW clk-gating hibern8 and HW auto-hibern8 both enabled collide: the redundant SW DME_HIBERNATE_ENTER on an already-parked link never completes (cmd 0x17 timeout -> -110 -> link broken). Disable HW auto-hibern8 via the quirk.
…pse PHY After a cold rail-collapse the marginal M-PHY's hibern8 enter late-completes (~566ms) and -ETIMEDOUTs against the 500ms default. Persist ufshcd_core.uic_cmd_timeout=3000 via the kernel cmdline (ufshcd is built-in).
SM8550 deep suspend (S2RAM) is now reliable, so promote suspend from the per-device AYN Thor quirk to the SM8550 platform quirk and apply it across the whole SoC (AYN Thor, AYANEO Pocket ACE, AYANEO Pocket S 2K). The power button routes through systemd-logind -> systemd-sleep, which writes the SuspendState tokens "mem standby freeze" to /sys/power/state in order, stopping at the first that succeeds. "mem" maps to deep; if the deep attempt returns an error (e.g. the in-flight power-key edge aborts it) systemd falls through to "freeze" (s2idle). On SM8550 s2idle parks forever in s2idle_enter(): the edge-triggered PMIC power-key wake is not re-delivered to break the swait (lost wakeup), and with no hardware watchdog the device is then unrecoverable. Deep resumes cleanly. Fix: pin deep-only. The platform quirk writes a sleep.conf.d drop-in that resets the SuspendState list (it is parsed as a strv, so drop-ins append) and pins it to "mem" alone, removing the freeze fall-through. A failed deep then cleanly no-ops instead of wedging. mem_sleep_default=deep is added to the cmdline as a secondary pin so "mem" can never itself map to s2idle. The drop-in and power-key binding are written before suspend is enabled to avoid any boot window where the fall-through list is briefly active. This enables suspend on the AYANEO Pocket ACE and Pocket S 2K, which share the SoC but have not been individually validated for suspend; the deep-only pin is strictly safer than the previous fall-through and the kernel suspend fixes already apply SoC-wide. - platforms/SM8550/030-suspend_mode: enable deep-only suspend (was disabled) - devices/AYN Thor/030-suspend_mode: removed (now handled by the platform) - devices/SM8550/options: add mem_sleep_default=deep
Add 1009-input-rsinput-handshake-mcu-version-on-init.patch. On resume the gamepad MCU is re-powered and rsinput_init_commands() sent the version request and the framed-report-mode params command after fixed msleep() delays. The MCU's post-power boot time varies, so on an unlucky resume the params command landed before the MCU was ready, was lost, and the MCU stayed in its default free-running stream -> rsinput RX buffer overflows and the gamepad is dead until a later suspend/resume happens to re-init in a ready window. Replace the blind delay with a handshake: send the version request and wait for the MCU's async version reply (delivered via the serdev rx callback) before sending params, with bounded retries and a timeout. Fixes the intermittent post-resume gamepad-dead / RX-overflow race.
jaewun's patch scoped the of_machine_is_compatible() check to ayn,thor. RP6's root compatible is "retroidpocket,rp6", "qcom,qcs8550", "qcom,sm8550" with no ayn,thor entry, so the fix silently did nothing there. Broaden the check to the SoC-level compatible so it covers every SM8550 board.
Exposes the last wakeup IRQ source on resume, for diagnosing spurious or unexpected wakeups.
The ADSP charger firmware pushes an unsolicited BATTMGR_NOTIFICATION (opcode 0x7) about 0.5s after suspend entry. It rings the IPCC mailbox irq, which upstream is IRQF_NO_SUSPEND, so it is never masked and wakes the device every few minutes. Drop IRQF_NO_SUSPEND so the mailbox irq is masked across system suspend. Lifted from ROCKNIX SM8750 (AYN Odin 3), which shares the battmgr/pmic_glink charger model and sleeps cleanly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR enables the
deepsuspend path (S2RAM) for SM8550, with testing focused on the Retroid Pocket 6.The foundation is @jaewun's SM8550 suspend/resume series, cherry-picked with his authorship intact: UFS PM-resume completion servicing and out-of-band relink drain, hibern8 vs clk-gating collisions, IRQ balancing on error paths, ICE crypto clock plumbing, geni UART irq masking, gamepad MCU quiesce/reinit, deep-only suspend, and the rsinput version handshake. This is his tested lineage, not the later thor-suspend-fixes-v1 reroll, which reworks the UFS relink handling and drops the ICE core-clk unwind; I have not built or tested that.
On top I add three commits:
qcom,sm8550so it also covers the RP6CONFIG_PM_SLEEP_DEBUGso testers can read/sys/power/pm_wakeup_irqThe IPCC change is the wake fix for the RP6. The ADSP charger firmware pushes an unsolicited
BATTMGR_NOTIFICATION(opcode 0x7) about 0.5s after suspend entry. It rings the IPCC mailbox irq, which isIRQF_NO_SUSPENDupstream so it is never masked and wakes the device every few minutes. DroppingIRQF_NO_SUSPENDmasks it across suspend. SM8750 already ships this; SM8550 was just missing it.s2idle(the freeze / "fake suspend" path) is disabled:mem_sleep_default=deepon the cmdline, andSuspendStatepinned tomemso a failed deep attempt no-ops instead of wedging ins2idle_enter(), where the edge-triggered PMIC power-key wake is not re-delivered and the device is left unrecoverable.Testing
Built and tested on the Retroid Pocket 6.
Ran repeated RTC-woken
deepsuspend/resume cycles, including longer unattended runs. Checked wake sources via/sys/power/pm_wakeup_irqand/sys/kernel/debug/wakeup_sources, and confirmed real time-in-suspend viaCLOCK_BOOTTIMEvsCLOCK_MONOTONICdrift.Test results
Deep suspend resumes successfully on the RP6. After the IPCC mask, suspends stayed down cleanly (verified ~14 minute suspends with no spurious wakes, 1681s total across the runs) where before the device self-woke every few minutes on the opcode 0x7 IPCC push. The gamepad works normally after wake.
One cosmetic leftover:
rsinput(serial1-0) logs a-110resume timeout insuspend_statslast_failed_dev, but the gamepad re-inits and works fine after wake.Additional Context
Scope is SM8550, not gated to one board, since this is SoC-level work and the kernel patches are guarded by DT compatibles anyway. @jaewun validated the foundation on the AYN Thor; I validated the combined stack on the RP6. One board-specific tuning to flag:
uic_cmd_timeout=3000was tuned to the Thor's post-collapse PHY margin.CONFIG_PM_SLEEP_DEBUGis a diagnostic aid, easy to drop if you would rather keep it out of nightly.Marked draft/WIP until it gets a clean build run against the current
nexttip.AI Usage
Did you use AI tools to help write this code? Yes.
AI tools were used during the investigation and packaging: correlating wakeup-source and suspend-stat logs to pin the wake to the opcode 0x7 IPCC push, comparing candidate patches across the SM8550 and SM8650 trees, and assembling this branch. The IPCC fix itself is lifted verbatim from ROCKNIX's SM8750 patch, and the final selection was validated by hardware testing on the RP6.