[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z5tJvgrYS5UoAHRD@hovoldconsulting.com>
Date: Thu, 30 Jan 2025 10:43:26 +0100
From: Johan Hovold <johan@...nel.org>
To: Abel Vesa <abel.vesa@...aro.org>
Cc: Bjorn Andersson <andersson@...nel.org>,
Konrad Dybcio <konradybcio@...nel.org>,
Caleb Connolly <caleb.connolly@...aro.org>,
Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
Neil Armstrong <neil.armstrong@...aro.org>,
linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
Johan Hovold <johan+linaro@...nel.org>, stable@...r.kernel.org
Subject: Re: [PATCH RFC v2] soc: qcom: pmic_glink: Fix device access from
worker during suspend
On Wed, Jan 29, 2025 at 01:46:15PM +0200, Abel Vesa wrote:
> For historical reasons, the GLINK smem interrupt is registered with
Please spell out these reasons so that we can determine if they are
still valid or not.
> IRRQF_NO_SUSPEND flag set, which is the underlying problem here, since the
> incoming messages can be delivered during late suspend and early
> resume.
>
> In this specific case, the pmic_glink_altmode_worker() currently gets
> scheduled on the system_wq which can be scheduled to run while devices
> are still suspended. This proves to be a problem when a Type-C retimer,
> switch or mux that is controlled over a bus like I2C, because the I2C
> controller is suspended.
This is not just an issue with an i2c retimer, this is a plain generic
bug in that the glink code which is calling drivers while their devices
are suspended.
> This has been proven to be the case on the X Elite boards where such
> retimers (ParadeTech PS8830) are used in order to handle Type-C
> orientation and altmode configuration. The following warning is thrown:
> The proper solution here should be to not deliver these kind of messages
> during system suspend at all, or at least make it configurable per glink
> client. But simply dropping the IRQF_NO_SUSPEND flag entirely will break
> other clients.
Which clients? Do we care enough about them to be papering over this
very real bug which can potentially crash the system (e.g. due to
unclocked register accesses)?
> The final shape of the rework of the pmic glink driver in
> order to fulfill both the filtering of the messages that need to be able
> to wake-up the system and the queueing of these messages until the system
> has properly resumed is still being discussed and it is planned as a
> future effort.
>
> Meanwhile, the stop-gap fix here is to schedule the pmic glink altmode
> worker on the system_freezable_wq instead of the system_wq. This will
> result in the altmode worker not being scheduled to run until the
> devices are resumed first, which will give the controllers like I2C a
> chance to resume before the transfer is requested.
This is incomplete at best, even as a stop gap. I just threw a
dump_stack() in qmp_combo_typec_switch_set() and see that it is being
called four times (don't ask me why) from two different workers on
hotplug.
Even if you use the freezable workqueue for altmode notifications, you
can still end up here via the ucsi notifications, and that now also
seems to generate a warning in the PM code during resume.
[ 25.684039] Call trace:
[ 25.686633] show_stack+0x18/0x24 (C)
[ 25.690492] dump_stack_lvl+0xc0/0xd0
[ 25.694350] dump_stack+0x18/0x24
[ 25.697851] qmp_combo_typec_switch_set+0x28/0x210 [phy_qcom_qmp_combo]
[ 25.704764] typec_switch_set+0x58/0x90 [typec]
[ 25.709525] ps883x_sw_set+0x2c/0x98 [ps883x]
[ 25.714092] typec_switch_set+0x58/0x90 [typec]
[ 25.718861] typec_set_orientation+0x24/0x6c [typec]
[ 25.724068] pmic_glink_ucsi_connector_status+0x5c/0x88 [ucsi_glink]
[ 25.730726] ucsi_handle_connector_change+0xc4/0x448 [typec_ucsi]
[ 25.737099] process_one_work+0x20c/0x610
[ 25.741322] worker_thread+0x23c/0x378
[ 25.745274] kthread+0x124/0x128
[ 25.748680] ret_from_fork+0x10/0x20
[ 25.753631] typec port4-partner: PM: parent port4 should not be sleeping
When you first brought up the possible workaround of using a freezable
workqueue, I mistakenly thought this would apply to all glink messages
(I realise that would defeat the purpose of allowing some early
notifications).
You've also found that the glink interrupts are waking up the system
unconditionally on battery notifications, which would be yet another
reason for disabling the interrupts (or implementing some kind of
masking).
I'm leaning towards just disabling the glink interrupts during suspend,
but that depends a bit on what (historical?) functionality actually need
them.
Johan
Powered by blists - more mailing lists