[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a5b5f9a4-0a75-764f-c949-4b90ec9decd6@quicinc.com>
Date: Fri, 15 Oct 2021 09:54:01 +0800
From: Fenglin Wu <quic_fenglinw@...cinc.com>
To: Stephen Boyd <sboyd@...nel.org>, <linux-arm-msm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
CC: <collinsd@...eaurora.org>, <subbaram@...eaurora.org>,
<tglx@...utronix.de>, <maz@...nel.org>
Subject: Re: [RESEND PATCH v1 3/9] spmi: pmic-arb: check apid against limits
before calling irq handler
On 10/15/2021 9:15 AM, Stephen Boyd wrote:
> Quoting Fenglin Wu (2021-10-13 20:11:40)
>> On 10/14/2021 3:25 AM, Stephen Boyd wrote:
>>> Quoting Fenglin Wu (2021-10-12 22:31:22)
>>>> On 10/13/2021 2:02 AM, Stephen Boyd wrote:
>>>>> Quoting Fenglin Wu (2021-09-16 23:32:58)
>>>>>> From: David Collins <collinsd@...eaurora.org>
>>>>>>
>>>>>> Check that the apid for an SPMI interrupt falls between the
>>>>>> min_apid and max_apid that can be handled by the APPS processor
>>>>>> before invoking the per-apid interrupt handler:
>>>>>> periph_interrupt().
>>>>>>
>>>>>> This avoids an access violation in rare cases where the status
>>>>>> bit is set for an interrupt that is not owned by the APPS
>>>>>> processor.
>>>>>>
>>>>>> Signed-off-by: David Collins <collinsd@...eaurora.org>
>>>>>> Signed-off-by: Fenglin Wu <quic_fenglinw@...cinc.com>
>>>>>> ---
>>>>> Fixes? BTW, a lot of these patches are irqchip specific. It would be
>>>>> good to get review from irqchip maintainers. Maybe we should split the
>>>>> irqchip driver off via the auxiliary bus so that irqchip maintainers can
>>>>> review. Please Cc them on irqchip related patches.
>>>>>
>>>>> IRQCHIP DRIVERS
>>>>> M: Thomas Gleixner <tglx@...utronix.de>
>>>>> M: Marc Zyngier <maz@...nel.org>
>>>> Sure, copied Thomas and Marc for code review.
>>>> This is a fix to avoid the register access violation in a case that an
>>>> interrupt is fired in a PMIC module which is not owned by APPS
>>>> processor.
>>> Got it.
>>>
>>>>>> drivers/spmi/spmi-pmic-arb.c | 6 ++++++
>>>>>> 1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c
>>>>>> index 4d7ad004..c4adc06 100644
>>>>>> --- a/drivers/spmi/spmi-pmic-arb.c
>>>>>> +++ b/drivers/spmi/spmi-pmic-arb.c
>>>>>> @@ -535,6 +535,12 @@ static void pmic_arb_chained_irq(struct irq_desc *desc)
>>>>>> id = ffs(status) - 1;
>>>>>> status &= ~BIT(id);
>>>>>> apid = id + i * 32;
>>>>>> + if (apid < pmic_arb->min_apid
>>>>>> + || apid > pmic_arb->max_apid) {
>>>>> The || goes on the line above. What about making a local variable for
>>>>> first and last and then shifting by 5 in the loop?
>>>>>
>>>>> int first = pmic_arb->min_apid;
>>>>> int last = pmic_arb->max_apid;
>>>>>
>>>>> for (i = first >> 5; i <= last >> 5; i++)
>>>>>
>>>>> if (apid < first || apid > last)
>>>> ACK, will update it following this.
>>>>>> + WARN_ONCE(true, "spurious spmi irq received for apid=%d\n",
>>>>>> + apid);
>>>>> Is there any way to recover from this? Or once the mapping is wrong
>>>>> we're going to get interrupts that we don't know what to do with
>>>>> forever?
>>>> This is a rare case that the unexpected interrupt is fired in a module
>>>> not owned by APPS process, so the interrupt itself is not expected hence
>>>> no need to recover from this but just bail out to avoid following register
>>>> access violation.
>>> And then the irq stops coming? It feels like a misconfiguration in the
>>> firmware that we're trying to hide, hence the WARN_ONCE(). Can we
>>> somehow silence irqs that aren't owned by the APPS when this driver
>>> probes so that they can't even happen after probe?
>> Actually this is a rarely happened case that couldn't be reproduced easily
>> and consistently for further debug. I agreed this should be caused by HW
>> misconfiguration or even some unknown HW bug that it would send out SPMI
>> interrupt messages with incorrect APID, but we have never had any chance
>> to find out the root cause. The patch here simply checked the APID and
>> bail out if it's not in the valid range, it won't cause anything bad but
>> improves the SW robustness. After that, the IRQ won't be triggered again
>> because the latched status in PMIC is not cleared. Also, because of the
>> access restriction to the registers corresponding to this APID, there is
>> nothing we can do from APPS processor side to keep it silent.
> This patch seems like a band-aid for an issue that isn't fully
> understood. I suppose it's good that the irq will stay asserted forever
> and then it won't happen again until it gets cleared by some other
> processor in the SoC. Instead of the WARN_ONCE() can we track if any irq
> is handled when the chained irq is raised, and if nothing is handled
> then call handle_bad_irq() on the chained descriptor? Take a look at
> pinctrl-msm.c to see how they handled spurious irqs that aren't actually
> directed at the APPS processor. We should do something similar here.
Sure, I will do it that way.
Powered by blists - more mailing lists