[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c15f083d-a2c1-462a-aad4-a72b36fbe1ac@oss.qualcomm.com>
Date: Fri, 31 Oct 2025 17:03:56 +0800
From: "Aiqun(Maria) Yu" <aiqun.yu@....qualcomm.com>
To: Stephan Gerhold <stephan.gerhold@...aro.org>,
Jingyi Wang <jingyi.wang@....qualcomm.com>
Cc: Bjorn Andersson <andersson@...nel.org>,
Mathieu Poirier <mathieu.poirier@...aro.org>,
Rob Herring <robh@...nel.org>,
Krzysztof Kozlowski <krzk+dt@...nel.org>,
Conor Dooley
<conor+dt@...nel.org>,
Manivannan Sadhasivam <mani@...nel.org>,
tingwei.zhang@....qualcomm.com, trilok.soni@....qualcomm.com,
yijie.yang@....qualcomm.com, linux-arm-msm@...r.kernel.org,
linux-remoteproc@...r.kernel.org, devicetree@...r.kernel.org,
linux-kernel@...r.kernel.org,
Gokul krishna Krishnakumar <Gokul.krishnakumar@....qualcomm>
Subject: Re: [PATCH v2 4/7] remoteproc: qcom: pas: Add late attach support for
subsystems
On 10/29/2025 6:03 PM, Stephan Gerhold wrote:
> On Wed, Oct 29, 2025 at 01:05:42AM -0700, Jingyi Wang wrote:
>> From: Gokul krishna Krishnakumar <Gokul.krishnakumar@....qualcomm>
>>
>> Subsystems can be brought out of reset by entities such as
>> bootloaders. Before attaching such subsystems, it is important to
>> check the state of the subsystem. This patch adds support to attach
>> to a subsystem by ensuring that the subsystem is in a sane state by
>> reading SMP2P bits and pinging the subsystem.
>>
>> Signed-off-by: Gokul krishna Krishnakumar <Gokul.krishnakumar@....qualcomm>
>> Co-developed-by: Jingyi Wang <jingyi.wang@....qualcomm.com>
>> Signed-off-by: Jingyi Wang <jingyi.wang@....qualcomm.com>
>> ---
>> drivers/remoteproc/qcom_q6v5.c | 89 ++++++++++++++++++++++++++++++++++++-
>> drivers/remoteproc/qcom_q6v5.h | 14 +++++-
>> drivers/remoteproc/qcom_q6v5_adsp.c | 2 +-
>> drivers/remoteproc/qcom_q6v5_mss.c | 2 +-
>> drivers/remoteproc/qcom_q6v5_pas.c | 63 +++++++++++++++++++++++++-
>> 5 files changed, 165 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/remoteproc/qcom_q6v5.c b/drivers/remoteproc/qcom_q6v5.c
>> index 58d5b85e58cd..4ce9e43fc5c7 100644
>> --- a/drivers/remoteproc/qcom_q6v5.c
>> +++ b/drivers/remoteproc/qcom_q6v5.c
>> [...]
>> @@ -234,6 +246,77 @@ unsigned long qcom_q6v5_panic(struct qcom_q6v5 *q6v5)
>> }
>> EXPORT_SYMBOL_GPL(qcom_q6v5_panic);
>>
>> +static irqreturn_t q6v5_pong_interrupt(int irq, void *data)
>> +{
>> + struct qcom_q6v5 *q6v5 = data;
>> +
>> + complete(&q6v5->ping_done);
>> +
>> + return IRQ_HANDLED;
>> +}
>> +
>> +int qcom_q6v5_ping_subsystem(struct qcom_q6v5 *q6v5)
>> +{
>> + int ret;
>> + int ping_failed = 0;
>> +
>> + reinit_completion(&q6v5->ping_done);
>> +
>> + /* Set master kernel Ping bit */
>> + ret = qcom_smem_state_update_bits(q6v5->ping_state,
>> + BIT(q6v5->ping_bit), BIT(q6v5->ping_bit));
>> + if (ret) {
>> + dev_err(q6v5->dev, "Failed to update ping bits\n");
>> + return ret;
>> + }
>> +
>> + ret = wait_for_completion_timeout(&q6v5->ping_done, msecs_to_jiffies(PING_TIMEOUT));
>> + if (!ret) {
>> + ping_failed = -ETIMEDOUT;
>> + dev_err(q6v5->dev, "Failed to get back pong\n");
>> + }
>> +
>> + /* Clear ping bit master kernel */
>> + ret = qcom_smem_state_update_bits(q6v5->ping_state, BIT(q6v5->ping_bit), 0);
>> + if (ret) {
>> + pr_err("Failed to clear master kernel bits\n");
>
> dev_err()?
>
>> + return ret;
>> + }
>> +
>> + if (ping_failed)
>> + return ping_failed;
>
> Could just "return ping_failed;" directly.
>
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(qcom_q6v5_ping_subsystem);
>> +
>> +int qcom_q6v5_ping_subsystem_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev)
>> +{
>> + int ret = -ENODEV;
>> +
>> + q6v5->ping_state = devm_qcom_smem_state_get(&pdev->dev, "ping", &q6v5->ping_bit);
>> + if (IS_ERR(q6v5->ping_state)) {
>> + dev_err(&pdev->dev, "failed to acquire smem state %ld\n",
>> + PTR_ERR(q6v5->ping_state));
>> + return ret;
>
> return PTR_ERR(q6v5->ping_state)?
>
>> + }
>> +
>> + q6v5->pong_irq = platform_get_irq_byname(pdev, "pong");
>> + if (q6v5->pong_irq < 0)
>> + return q6v5->pong_irq;
>> +
>> + ret = devm_request_threaded_irq(&pdev->dev, q6v5->pong_irq, NULL,
>> + q6v5_pong_interrupt, IRQF_TRIGGER_RISING | IRQF_ONESHOT,
>> + "q6v5 pong", q6v5);
>> + if (ret)
>> + dev_err(&pdev->dev, "failed to acquire pong IRQ\n");
>> +
>> + init_completion(&q6v5->ping_done);
>
> It would be better to have init_completion() before requesting the
> interrupt, to guarantee that complete(&q6v5->ping_done); cannot happen
> before the completion struct is initialized.
>
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(qcom_q6v5_ping_subsystem_init);
>> +
>> /**
>> * qcom_q6v5_init() - initializer of the q6v5 common struct
>> * @q6v5: handle to be initialized
>> @@ -247,7 +330,7 @@ EXPORT_SYMBOL_GPL(qcom_q6v5_panic);
>> */
>> int qcom_q6v5_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev,
>> struct rproc *rproc, int crash_reason, const char *load_state,
>> - void (*handover)(struct qcom_q6v5 *q6v5))
>> + bool early_boot, void (*handover)(struct qcom_q6v5 *q6v5))
>> {
>> int ret;
>>
>> @@ -255,10 +338,14 @@ int qcom_q6v5_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev,
>> q6v5->dev = &pdev->dev;
>> q6v5->crash_reason = crash_reason;
>> q6v5->handover = handover;
>> + q6v5->early_boot = early_boot;
>>
>> init_completion(&q6v5->start_done);
>> init_completion(&q6v5->stop_done);
>>
>> + if (early_boot)
>> + init_completion(&q6v5->subsys_booted);
>> +
>> q6v5->wdog_irq = platform_get_irq_byname(pdev, "wdog");
>> if (q6v5->wdog_irq < 0)
>> return q6v5->wdog_irq;
>> diff --git a/drivers/remoteproc/qcom_q6v5.h b/drivers/remoteproc/qcom_q6v5.h
>> index 5a859c41896e..8a227bf70d7e 100644
>> --- a/drivers/remoteproc/qcom_q6v5.h
>> +++ b/drivers/remoteproc/qcom_q6v5.h
>> @@ -12,27 +12,35 @@ struct rproc;
>> struct qcom_smem_state;
>> struct qcom_sysmon;
>>
>> +#define PING_TIMEOUT 500 /* in milliseconds */
>> +#define PING_TEST_WAIT 500 /* in milliseconds */
>
> Why is this defined in the shared header rather than the C file that
> uses this?
>
> PING_TEST_WAIT looks unused.
>
>> +
>> struct qcom_q6v5 {
>> struct device *dev;
>> struct rproc *rproc;
>>
>> struct qcom_smem_state *state;
>> + struct qcom_smem_state *ping_state;
>> struct qmp *qmp;
>>
>> struct icc_path *path;
>>
>> unsigned stop_bit;
>> + unsigned int ping_bit;
>>
>> int wdog_irq;
>> int fatal_irq;
>> int ready_irq;
>> int handover_irq;
>> int stop_irq;
>> + int pong_irq;
>>
>> bool handover_issued;
>>
>> struct completion start_done;
>> struct completion stop_done;
>> + struct completion subsys_booted;
>> + struct completion ping_done;
>>
>> int crash_reason;
>>
>> @@ -40,11 +48,13 @@ struct qcom_q6v5 {
>>
>> const char *load_state;
>> void (*handover)(struct qcom_q6v5 *q6v5);
>> +
>> + bool early_boot;
>> };
>>
>> int qcom_q6v5_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev,
>> struct rproc *rproc, int crash_reason, const char *load_state,
>> - void (*handover)(struct qcom_q6v5 *q6v5));
>> + bool early_boot, void (*handover)(struct qcom_q6v5 *q6v5));
>> void qcom_q6v5_deinit(struct qcom_q6v5 *q6v5);
>>
>> int qcom_q6v5_prepare(struct qcom_q6v5 *q6v5);
>> @@ -52,5 +62,7 @@ int qcom_q6v5_unprepare(struct qcom_q6v5 *q6v5);
>> int qcom_q6v5_request_stop(struct qcom_q6v5 *q6v5, struct qcom_sysmon *sysmon);
>> int qcom_q6v5_wait_for_start(struct qcom_q6v5 *q6v5, int timeout);
>> unsigned long qcom_q6v5_panic(struct qcom_q6v5 *q6v5);
>> +int qcom_q6v5_ping_subsystem(struct qcom_q6v5 *q6v5);
>> +int qcom_q6v5_ping_subsystem_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev);
>>
>> #endif
>> [...]
>> diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c
>> index 158bcd6cc85c..b667c11aadb5 100644
>> --- a/drivers/remoteproc/qcom_q6v5_pas.c
>> +++ b/drivers/remoteproc/qcom_q6v5_pas.c
>> @@ -35,6 +35,8 @@
>>
>> #define MAX_ASSIGN_COUNT 3
>>
>> +#define EARLY_BOOT_RETRY_INTERVAL_MS 5000
>> +
>> struct qcom_pas_data {
>> int crash_reason_smem;
>> const char *firmware_name;
>> @@ -59,6 +61,7 @@ struct qcom_pas_data {
>> int region_assign_count;
>> bool region_assign_shared;
>> int region_assign_vmid;
>> + bool early_boot;
>> };
>>
>> struct qcom_pas {
>> @@ -409,6 +412,8 @@ static int qcom_pas_stop(struct rproc *rproc)
>> if (pas->smem_host_id)
>> ret = qcom_smem_bust_hwspin_lock_by_host(pas->smem_host_id);
>>
>> + pas->q6v5.early_boot = false;
>> +
>> return ret;
>> }
>>
>> @@ -434,6 +439,51 @@ static unsigned long qcom_pas_panic(struct rproc *rproc)
>> return qcom_q6v5_panic(&pas->q6v5);
>> }
>>
>> +static int qcom_pas_attach(struct rproc *rproc)
>> +{
>> + int ret;
>> + struct qcom_pas *adsp = rproc->priv;
>> + bool ready_state;
>> + bool crash_state;
>> +
>> + if (!adsp->q6v5.early_boot)
>> + return -EINVAL;
>> +
>> + ret = irq_get_irqchip_state(adsp->q6v5.fatal_irq,
>> + IRQCHIP_STATE_LINE_LEVEL, &crash_state);
>> +
>> + if (crash_state) {
>
> crash_state will be uninitialized if irq_get_irqchip_state() returns an
> error.
Good catch.
Suggest to check ret result. If don't have fatal_irq available, then
just return fail on the attach and don't need to try crash reporting.
>
>> + dev_err(adsp->dev, "Sub system has crashed before driver probe\n");
>> + adsp->rproc->state = RPROC_CRASHED;
>> + return -EINVAL;
>
> Ok, so the subsystem has crashed. Now what? We probably want to restart
> it, but I don't think anyone will handle the RPROC_CRASHED state you are
> setting here.
Agree. RPROC_CRASHED needed to be left for crash handler to set.
>
> I think it would make more sense to call rproc_report_crash() here. This
> will set RPROC_CRASHED for you and trigger recovery. I'm not sure if
> this works properly in RPROC_DETACHED state, please test to make sure
> that works as intended.
Agree.
Suggest to have:
q6v5->running = false;
rproc_report_crash(q6v5->rproc, RPROC_FATAL_ERROR);
Test to be performed like:
Explicitly hack to always comes to crash_state here to see if it is good
to perform the crash recovery.
>
>> + }
>> +
>> + ret = irq_get_irqchip_state(adsp->q6v5.ready_irq,
>> + IRQCHIP_STATE_LINE_LEVEL, &ready_state);
>> +
>> + if (ready_state) {
>> + dev_info(adsp->dev, "Sub system has boot-up before driver probe\n");
>
> This message feels redundant, dmesg already shows a different message
> for "attaching" vs "booting" a remoteproc.
>
>> + adsp->rproc->state = RPROC_DETACHED;
>
> What is the point of this assignment? You have already set this state
> inside qcom_pas_probe().
Make sense.
>
>> + } else {
>> + ret = wait_for_completion_timeout(&adsp->q6v5.subsys_booted,
>> + msecs_to_jiffies(EARLY_BOOT_RETRY_INTERVAL_MS));
>> + if (!ret) {
>> + dev_err(adsp->dev, "Timeout on waiting for subsystem interrupt\n");
>> + return -ETIMEDOUT;
>> + }
>
> This looks like you want to handle the case where the remoteproc is
> still booting while this code is running (i.e. it has not finished
> booting yet by signaling the ready state). Is this situation actually
> possible with the current firmware design?
This shouldn't happen during the initial boot stage, as far as I understand.
The current remoteproc is required by the bootloader/firmware before the
kernel even starts, so it shouldn't be in a state where it's still
booting at that point. If it were, the early_boot feature wouldn't be
necessary at all.
However, if the remoteproc is designed like in a second attempt to
attach—especially when RPROC_FEAT_ATTACH_ON_RECOVERY is enabled—then
it's possible this could occur(remoteproc is auto booting while kernel
is trying to attach with ready state check) as a corner case during boot-up.
>
> I don't see how this would reliably work in practice. If firmware boots
> a remoteproc early it should wait until:
>
> - Handover is signaled, to ensure the proxy votes are kept
> - Ready is signaled, to ensure the metadata region remains reserved
>
> None of this is guaranteed if the firmware gives up control to Linux
> before waiting for the signals.
>
> I would suggest dropping all the code related to handling the late
> "subsys_booted" completion. If this is needed, can you explain when
> exactly this situation happens and how you guarantee reliable startup of
> the remoteproc?
For the kaanapali specific remoteproc(soccp) with early-boot feature
here, it is rely on the rproc_shutdown/boot to recovery. And it should
be very corner case like bootloader/firmware bug to have such kind of
not ready state. Maybe we can simple remove the "subsys_booted"
mechanism, and only do a rproc_report_crash in this corner case.
>
>> + }
>> +
>> + ret = qcom_q6v5_ping_subsystem(&adsp->q6v5);
>> + if (ret) {
>> + dev_err(adsp->dev, "Failed to ping subsystem, assuming device crashed\n");
>> + rproc->state = RPROC_CRASHED;
>> + return ret;
>> + }
>> +
>> + adsp->q6v5.running = true;
>
> You should probably also set q6v5->handover_issued = true;, otherwise
> qcom_pas_stop() will later drop all the handover votes that you have
> never made. This will break all the reference counting.
Acked for all above comments you described and well understood.
>
> Overall, this patch feels quite fragile in the current state. Please
> make sure you carefully consider all side effects and new edge cases
> introduced by your changes.
While for other edge cases and side effects, maybe Stephan can help on
have more details.
>
> Thanks,
> Stephan
--
Thx and BRs,
Aiqun(Maria) Yu
Powered by blists - more mailing lists