[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1f82ad2a-fdb0-41e6-a9cd-02f7ad2978de@linaro.org>
Date: Fri, 7 Feb 2025 15:28:27 +0100
From: Caleb Connolly <caleb.connolly@...aro.org>
To: Abel Vesa <abel.vesa@...aro.org>, Bjorn Andersson <andersson@...nel.org>
Cc: Krzysztof Kozlowski <krzk@...nel.org>,
Konrad Dybcio <konradybcio@...nel.org>,
Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
Johan Hovold <johan@...nel.org>, Neil Armstrong <neil.armstrong@...aro.org>,
linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH RFC] soc: qcom: pmic_glink: Fix device access from worker
during suspend
On 1/29/25 09:46, Abel Vesa wrote:
> On 25-01-11 13:22:12, Bjorn Andersson wrote:
>> On Sat, Jan 11, 2025 at 04:35:09PM +0100, Krzysztof Kozlowski wrote:
>>> On 10/01/2025 16:29, Abel Vesa wrote:
>>>> The pmic_glink_altmode_worker() currently gets scheduled on the system_wq.
>>>> When the system is suspended (s2idle), the fact that the worker can be
>>>> scheduled to run while devices are still suspended provesto be a problem
>>>> when a Type-C retimer, switch or mux that is controlled over a bus like
>>>> I2C, because the I2C controller is suspended.
>>>>
>>>> This has been proven to be the case on the X Elite boards where such
>>>> retimers (ParadeTech PS8830) are used in order to handle Type-C
>>>> orientation and altmode configuration. The following warning is thrown:
>>>>
>>>> [ 35.134876] i2c i2c-4: Transfer while suspended
>>>> [ 35.143865] WARNING: CPU: 0 PID: 99 at drivers/i2c/i2c-core.h:56 __i2c_transfer+0xb4/0x57c [i2c_core]
>>>> [ 35.352879] Workqueue: events pmic_glink_altmode_worker [pmic_glink_altmode]
>>>> [ 35.360179] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>>>> [ 35.455242] Call trace:
>>>> [ 35.457826] __i2c_transfer+0xb4/0x57c [i2c_core] (P)
>>>> [ 35.463086] i2c_transfer+0x98/0xf0 [i2c_core]
>>>> [ 35.467713] i2c_transfer_buffer_flags+0x54/0x88 [i2c_core]
>>>> [ 35.473502] regmap_i2c_write+0x20/0x48 [regmap_i2c]
>>>> [ 35.478659] _regmap_raw_write_impl+0x780/0x944
>>>> [ 35.483401] _regmap_bus_raw_write+0x60/0x7c
>>>> [ 35.487848] _regmap_write+0x134/0x184
>>>> [ 35.491773] regmap_write+0x54/0x78
>>>> [ 35.495418] ps883x_set+0x58/0xec [ps883x]
>>>> [ 35.499688] ps883x_sw_set+0x60/0x84 [ps883x]
>>>> [ 35.504223] typec_switch_set+0x48/0x74 [typec]
>>>> [ 35.508952] pmic_glink_altmode_worker+0x44/0x1fc [pmic_glink_altmode]
>>>> [ 35.515712] process_scheduled_works+0x1a0/0x2d0
>>>> [ 35.520525] worker_thread+0x2a8/0x3c8
>>>> [ 35.524449] kthread+0xfc/0x184
>>>> [ 35.527749] ret_from_fork+0x10/0x20
>>>>
>>>> The solution here is to schedule the altmode worker on the system_freezable_wq
>>>> instead of the system_wq. This will result in the altmode worker not being
>>>> scheduled to run until the devices are resumed first, which will give the
>>>> controllers like I2C a chance to resume before the transfer is requested.
>>>>
>>>> Fixes: 080b4e24852b ("soc: qcom: pmic_glink: Introduce altmode support")
>>>> Cc: stable@...r.kernel.org # 6.3
>>>> Signed-off-by: Abel Vesa <abel.vesa@...aro.org>
>>>
>>> This is an incomplete fix, I think. You fix one case but several other
>>> possibilities are still there:
>>>
>>
>> I agree, this whacks only one mole, but it's reasonable to expect that
>> there are more hidden here.
>>
>>> 1. Maybe the driver just lacks proper suspend/resume handling?
>>> I assume all this happens during system suspend, so what certainty you
>>> have that your second work - pmic_glink_altmode_pdr_notify() - is not
>>> executed as well?
>>>
>>> 2. Follow up: all other drivers and all other future use cases will be
>>> affected as well. Basically what this patch is admitting is that driver
>>> can be executed anytime, even during suspend, so each call of
>>> pmic_glink_send() has to be audited. Now and in the future, because what
>>> stops some developer of adding one more path calling pmic_glink_send(),
>>> which also turns out to be executed during suspend?
>>>
>>> 3. So qcom_battmgr.c is buggy as well?
>>>
>>> 4. ucsi_glink.c? I don't see handling suspend, either...
>>>
>>> Maybe the entire problem is how pmic glink was designed: not as proper
>>> bus driver which handles both child-parent relationship and system suspend.
>>
>> The underlying problem is that GLINK register its interrupt as
>> IRQF_NO_SUSPEND (for historical reasons) and as such incoming messages
>> will be delivered in late suspend and early resume. In this specific
>> case, a specific message is handled by pmic_glink_altmode_callback(), by
>> invoking schedule_work() which in this case happens to schedule
>> pmic_glink_altmode_worker before we've resumed the I2C controller.
>>
>> I presume with your suggestion about a pmic_glink bus driver we'd come
>> up with some mechanism for pmic_glink to defer these messages until
>> resume has happened?
>
> So is the suggestion here to rework the entire pmic_glink into a bus
> driver? (I like the sound of that)
>
> I'd assume the new bus driver will have to queue the messages until it
> has resumed, which is fine.
>
> But still doesn't solve the fact that we can't filter out when to
> wake-up or not. What am I missing here?
Some kind of mechanism for devices on the GLINK bus to either:
* Have a callback which gets called even in suspend to "peek" messages
and either drop them, defer until resume, or trigger a wakeup. This
could call all the way into battmgr which could determine if a power
cable was just plugged in for example.
* More ideally, if the message format is consistent/simple enough they
could just tell the GLINK bus to wake up if certain bytes have certain
values in a message, not sure if the firmware is nice enough to make
this viable though.
>
>>
>> As you suggest, I too suspect that we have more of these hidden in other
>> rpmsg client drivers.
>>
>>
>> In the discussions leading up to this patch we agreed that a better
>> solution would be to change GLINK (SMEM) to not deliver messages when
>> the system is suspended. But as this has an impact on how GLINK may or
>> may not wake up the system, Abel's fix is a reasonable stop-gap solution
>> while we work that out.
>>
>> That said, this backstory, the description of the actual underlying
>> problem, the planned longevity (shortgevity?) of this fix are missing
>> from the commit message. As written, we could expect a good Samaritan to
>> come in and replicate this fix across all those other use cases,
>> contrary to the agreed plans.
>>
>>
>> @Abel, can you please make sure that your commit message captures those
>> aspects as well?
>
> Sure, will through the backstory and the reason this is a temporary fix
> with the hope that this is all going to be reworked soon.
>
>>
>> Regards,
>> Bjorn
>
> Thanks for reviewing,
>
> Abel
--
Caleb (they/them)
Powered by blists - more mailing lists