[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c2c56a48-9d0e-4a0e-acd8-94af5d80460f@ti.com>
Date: Thu, 10 Jul 2025 19:09:09 -0500
From: Kendall Willis <k-willis@...com>
To: Nishanth Menon <nm@...com>
CC: <kristo@...nel.org>, <ssantosh@...nel.org>,
<linux-arm-kernel@...ts.infradead.org>, <linux-kernel@...r.kernel.org>,
<ulf.hansson@...aro.org>, <vigneshr@...com>, <d-gole@...com>,
<vishalm@...com>, <sebin.francis@...com>, <msp@...libre.com>,
<khilman@...libre.com>
Subject: Re: [PATCH v3] firmware: ti_sci: Enable abort handling of entry to
LPM
On 7/10/25 00:44, Nishanth Menon wrote:
> On 17:16-20250709, Kendall Willis wrote:
>> The PM co-processor (device manager or DM) adds the ability to abort
>> entry to a low power mode by clearing the mode selection in the
>> latest version of its firmware (11.x). The following power management
>> operation defined in the TISCI Low Power Mode API [1] is implemented to
>> enable aborting entry to LPM:
>>
>> TISCI_MSG_LPM_ABORT
>> Abort the current low power mode entry by clearing the current mode
>> selection.
>>
>> Introduce LPM abort call that enables the ti_sci driver to support abort
>> by clearing the low power mode selection of the DM. This fixes behavior
>> from the DM where if system suspend failed, the next time system suspend
>> is entered, it will fail because DM did not have the low power mode
>> selection cleared. Instead of this behavior, the low power mode selection
>> will be cleared after Linux resume which will allow subsequent system
>> suspends to work correctly.
>>
>> When Linux suspends, the TI SCI ->suspend() call will send a prepare_sleep
>> message to the DM. The DM will choose what low power mode to enter once
>> Linux is suspended based on constraints given by devices in the TI SCI PM
>> domain. After system suspend completes, regardless of if system suspend
>> succeeds or fails, the ->complete() hook in TI SCI will be called. In the
>> ->complete() hook, a message will be sent to the DM to clear the current
>> low power mode selection. This is necessary because if suspend fails, the
>> low power mode selection in the DM is not cleared and the next system
>> suspend will fail due to the low power mode not having been cleared from
>> the previous failed system suspend.
>>
>> Clearing the mode selection unconditionally acts as a cleanup from sending
>> the prepare_sleep message in ->suspend(). The DM already clears the low
>> power selection automatically when resuming from system suspend. If
>> suspend/resume executed without failure, clearing the low power mode
>> selection will not cause an error in the DM.
>>
>> The flow for the abort sequence is the following:
>> 1. User sends a command to enter sleep
>> 2. Linux starts to suspend drivers
>> 3. TI SCI suspends and sends prepare_sleep message to DM
>> 4. A driver fails to suspend
>> 5. Linux resumes the drivers that have already suspended
>> 6. Linux sends DM to clear the current low power mode selection from
>> TI SCI ->complete() hook
>> 7. DM aborts LPM entry by clearing the current mode selection
>> 8. Linux works as normal
>
> Could we trim the message a bit down? it is informative, thanks.. but I
> think a bit repetitive.
Will fix in v4.
>
>>
>> [1] https://software-dl.ti.com/tisci/esd/latest/2_tisci_msgs/pm/lpm.html
>>
>> Signed-off-by: Kendall Willis <k-willis@...com>
>> ---
>> Series has been tested on an SK-AM62B-P1 board. Normal suspend/resume
>> has been verified. Abort was tested by adding an error into the TI SCI
>> suspend hook.
>
> btw, does this handle the noirq case as well? I have'nt looked closely
> at the sequence to be sure.
It does. I tested adding an error into the TI SCI suspend_noirq hook
using this patch on top of latest TI SDK [1]. Abort worked. I was not
able to test with kernel v6.16 next because when I added an error into
TI SCI suspend_noirq hook, Linux would not resume.
[1]
https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-6.12.y-cicd
>
>>
>> Link to v2:
>> https://lore.kernel.org/all/20250709205332.2235072-1-k-willis@ti.com/
>> Link to v1:
>> https://lore.kernel.org/all/20250627204821.1150459-1-k-willis@ti.com/
>>
>> Changes from v2 to v3:
>> - added links to previous series and the changes between them
>
> Thanks, but in the future, I'd rather not want a v3, but just reply
> with the missing info and better still, add to your pre-send checklist
> to ensure you don't miss it in the future ;).
>
>
Noted, will definitely add to my own checklist.
>>
>> Changes from v1 to v2:
>> - rebase on linux-next
>> - drop the following patch:
>> pmdomain: ti_sci: Add LPM abort sequence to suspend path
>> - remove lpm_abort from ti_sci_pm_ops
>> - add ->complete() hook with ti_sci_cmd_lpm_abort to be called
>> unconditionally within it
>> - remove ti_sci_cmd_lpm_abort from the ->suspend() and
>> ->suspend_noirq() hooks
>> - reword commit message
>> ---
>> drivers/firmware/ti_sci.c | 61 +++++++++++++++++++++++++++++++++++++++
>> drivers/firmware/ti_sci.h | 3 +-
>> 2 files changed, 63 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/firmware/ti_sci.c b/drivers/firmware/ti_sci.c
>> index ae5fd1936ad32..63c405f7037f0 100644
>> --- a/drivers/firmware/ti_sci.c
>> +++ b/drivers/firmware/ti_sci.c
>> @@ -2015,6 +2015,58 @@ static int ti_sci_cmd_set_latency_constraint(const struct ti_sci_handle *handle,
>> return ret;
>> }
>>
>> +/**
>> + * ti_sci_cmd_lpm_abort() - Abort entry to LPM by clearing selection of LPM to enter
>> + * @handle: pointer to TI SCI handle
>> + *
>> + * Return: 0 if all went well, else returns appropriate error value.
>> + */
>> +static int ti_sci_cmd_lpm_abort(const struct ti_sci_handle *handle)
>> +{
>> + struct ti_sci_info *info;
>> + struct ti_sci_msg_hdr *req;
>> + struct ti_sci_msg_hdr *resp;
>> + struct ti_sci_xfer *xfer;
>> + struct device *dev;
>> + int ret = 0;
>> +
>> + if (IS_ERR(handle))
>> + return PTR_ERR(handle);
>> + if (!handle)
>> + return -EINVAL;
>> +
>> + info = handle_to_ti_sci_info(handle);
>> + dev = info->dev;
>
> -ECONFUSED. ti_sci_complete already gets dev and info and this API is
> not exposed to other users. So why do we need to flip back and forth
> with info->handle and then get info from handle and dev again??
I had the parameter as 'const struct ti_sci_handle *handle' since all
other functions that send a message to DM have that as the parameter, so
I followed the convention. However, since the API is not exposed, I can
change the parameter to be 'struct device *dev' in the next version.
>> +
>> + xfer = ti_sci_get_one_xfer(info, TI_SCI_MSG_LPM_ABORT,
>> + TI_SCI_FLAG_REQ_ACK_ON_PROCESSED,
>> + sizeof(*req), sizeof(*resp));
>> + if (IS_ERR(xfer)) {
>> + ret = PTR_ERR(xfer);
>> + dev_err(dev, "Message alloc failed(%d)\n", ret);
>> + return ret;
>> + }
>> + req = (struct ti_sci_msg_hdr *)xfer->xfer_buf;
>> +
>> + ret = ti_sci_do_xfer(info, xfer);
>> + if (ret) {
>> + dev_err(dev, "Mbox send fail %d\n", ret);
>> + goto fail;
>> + }
>> +
>> + resp = (struct ti_sci_msg_hdr *)xfer->xfer_buf;
>> +
>> + if (!ti_sci_is_response_ack(resp))
>> + ret = -ENODEV;
>> + else
>> + ret = 0;
> is'nt ret already 0?
>
> OR you could go with ? like rest of code.. ;)
Good catch, will remove the else section there.
>
>> +
>> +fail:
>> + ti_sci_put_one_xfer(&info->minfo, xfer);
>> +
>> + return ret;
>> +}
>> +
>> static int ti_sci_cmd_core_reboot(const struct ti_sci_handle *handle)
>> {
>> struct ti_sci_info *info;
>> @@ -3739,11 +3791,20 @@ static int __maybe_unused ti_sci_resume_noirq(struct device *dev)
>> return 0;
>> }
>>
>> +static void __maybe_unused ti_sci_complete(struct device *dev)
>
> ti_sci_pm_complete or something like that?
Will change to this in v4.
>
>> +{
>> + struct ti_sci_info *info = dev_get_drvdata(dev);
>> +
>> + if (ti_sci_cmd_lpm_abort(&info->handle))
>
> I see from the documentation of .complete that it is invoked in
> multitude of scenarios, including resume as well. While I think it is
> probably fine to clear the state, have you had a chance to look at
> possible side effects in other flows (thaw etc..?)
Based on the documentation in the other flows I don't think it would
cause any side effects. Both ->restore() and ->thaw() hooks in
hibernation act similarly to ->resume(). Therefore, clearing the LPM
selection should work fine after those hooks.
>
> Additionally, do we want to check info->fw_caps &
> MSG_FLAG_CAPS_LPM_DM_MANAGED before sending it over to DM?
Yes, a check for MSG_FLAG_CAPS_LPM_DM_MANAGED should be added before
sending to DM. I'll add that in next version.
>
>> + dev_err(dev, "LPM clear selection failed.\n");
>> +}
>> +
>> static const struct dev_pm_ops ti_sci_pm_ops = {
>> #ifdef CONFIG_PM_SLEEP
>> .suspend = ti_sci_suspend,
>> .suspend_noirq = ti_sci_suspend_noirq,
>> .resume_noirq = ti_sci_resume_noirq,
>> + .complete = ti_sci_complete,
>
> Another question - when is .complete called as part of rewind? does DM
> behave sane while other drivers are resuming back up before .complete is
> invoked?
.complete is called after all drivers are resumed. DM does behave
normally during this. Adding the .complete makes it so that if a driver
failed during the first suspend cycle, DM won't have a stale LPM
selected. The stale LPM selection in DM would cause the DM to NACK
prepare_sleep on the next suspend cycle.
>
>> #endif
>> };
>>
>> diff --git a/drivers/firmware/ti_sci.h b/drivers/firmware/ti_sci.h
>> index 053387d7baa06..51d77f90a32cc 100644
>> --- a/drivers/firmware/ti_sci.h
>> +++ b/drivers/firmware/ti_sci.h
>> @@ -6,7 +6,7 @@
>> * The system works in a message response protocol
>> * See: https://software-dl.ti.com/tisci/esd/latest/index.html for details
>> *
>> - * Copyright (C) 2015-2024 Texas Instruments Incorporated - https://www.ti.com/
>> + * Copyright (C) 2015-2025 Texas Instruments Incorporated - https://www.ti.com/
>
> please dont keep shifting license year for trivial changes :)
>> */
>>
>> #ifndef __TI_SCI_H
>> @@ -42,6 +42,7 @@
>> #define TI_SCI_MSG_SET_IO_ISOLATION 0x0307
>> #define TI_SCI_MSG_LPM_SET_DEVICE_CONSTRAINT 0x0309
>> #define TI_SCI_MSG_LPM_SET_LATENCY_CONSTRAINT 0x030A
>> +#define TI_SCI_MSG_LPM_ABORT 0x0311
>
> NOTE: all the LPM stuff is enabled with MSG_FLAG_CAPS_LPM_DM_MANAGED.
> Is this supported from the very beginning version of firmware that
> has this? else will we see issues in the field with a mix of firmware
> versions.. some just crashing out when the message is not supported?
This is newly supported in firmware 11.0, whereas the other LPM features
were supported in firmware 10.0. I will have to check if there is any
way for abort to be not called if firmware doesn't support it.
>
>>
>> /* Resource Management Requests */
>> #define TI_SCI_MSG_GET_RESOURCE_RANGE 0x1500
>>
>> base-commit: 835244aba90de290b4b0b1fa92b6734f3ee7b3d9
>> --
>> 2.34.1
>>
>
Thanks for taking the time review at this patch :)
---
Best,
Kendall Willis
Powered by blists - more mailing lists