linux-kernel - Re: [External] Re: [PATCH 1/5] irqchip/riscv-intc: Balance priority and fairness during irq handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPYmKFtj3esCztK2++Chg-V1isAvVUXpEH3Pm-CaK=uZo_HfpA@mail.gmail.com>
Date: Wed, 15 Jan 2025 20:37:49 +0800
From: Xu Lu <luxu.kernel@...edance.com>
To: Anup Patel <anup@...infault.org>
Cc: daniel.lezcano@...aro.org, tglx@...utronix.de, paul.walmsley@...ive.com, 
	palmer@...belt.com, lihangjing@...edance.com, xieyongji@...edance.com, 
	linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH 1/5] irqchip/riscv-intc: Balance priority
 and fairness during irq handling

Hi Anup,

I agree that a standardized NMI or SSE event is the optimal solution
to the watchdog problem. Here I just use buddy watchdog to illustrate
the possible consequences of interrupts with low priority getting
starved.

I believe that AIA or firmware can adjust the priority of different
irqs. What I want to emphasize is that in existing implementation,
kernel will continue handling newly arrived high priority irqs, no
matter how long lower priority irqs have been waiting for.

For example, if both external irq and IPI (assuming non AIA
architecture using software irq) arrive in cycle 0, kernel will handle
the external irq first. Then a new external irq arrives during the
first external irq handling procedure, let's say cycle 100. Then
kernel finishes the first external irq handling procedure, let's say
cycle 200, it will handle the newly arrived external irq, instead of
the already arrived IPI. The IPI won't be handled until the second
external irq is handled, let's say cycle 300. Things get worse if
external interrupts keep arriving.

I think it is better to provide a mechanism to avoid this. So I regard
interrupts arriving within a certain period as a round and only handle
interrupts in the new round after all interrupts in the old round have
been handled. The interrupt priority only takes effect in the same
round.

Of course this is not the optimal way. Please give some advice if you
also think it is necessary (for example introduce an interrupt
priority decreasing mechanism instead).

Best Regards,

Xu Lu

On Wed, Jan 15, 2025 at 7:39 PM Anup Patel <anup@...infault.org> wrote:
>
> On Mon, Jan 13, 2025 at 8:39 PM Xu Lu <luxu.kernel@...edance.com> wrote:
> >
> > Both csr cause and csr topi record the pending bit with the highest
> > priority. If interrupts with high priority arrive frequently within a
> > certain period of time, the interrupts with low priority won't get a
> > chance to be handled.
> >
> > For example, if external interrupts and software interrupts arrive very
> > frequently, the timer interrupts will never be handled. Then buddy
> > watchdog on a buddy CPU will report a hardlockup on the current CPU while
> > current CPU actually can receive irq.
>
> Platform with proper HW watchdog will not see this issue because HW
> watchdog interrupt will be an external interrupt.
>
> There was an effort to standardize watchdog for RISC-V platforms but it was
> deferred to future standardization. We should resume those discussions within
> RVI forums.
>
> >
> > This commit solves this problem by handling all pending irqs in a round.
> > During each round, this commit handles pending irqs by their priority.
> >
> > Signed-off-by: Xu Lu <luxu.kernel@...edance.com>
> > ---
> >  drivers/irqchip/irq-riscv-intc.c | 32 ++++++++++++++++++++++++++------
> >  1 file changed, 26 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-riscv-intc.c b/drivers/irqchip/irq-riscv-intc.c
> > index f653c13de62b..bc2ec26aa9e9 100644
> > --- a/drivers/irqchip/irq-riscv-intc.c
> > +++ b/drivers/irqchip/irq-riscv-intc.c
> > @@ -26,20 +26,40 @@ static unsigned int riscv_intc_nr_irqs __ro_after_init = BITS_PER_LONG;
> >  static unsigned int riscv_intc_custom_base __ro_after_init = BITS_PER_LONG;
> >  static unsigned int riscv_intc_custom_nr_irqs __ro_after_init;
> >
> > +static unsigned int riscv_prio_irqs[] = {
> > +#ifdef CONFIG_RISCV_M_MODE
> > +       IRQ_M_EXT, IRQ_M_SOFT, IRQ_M_TIMER,
> > +#endif
> > +       IRQ_S_EXT, IRQ_S_SOFT, IRQ_S_TIMER, IRQ_S_GEXT,
> > +       IRQ_VS_EXT, IRQ_VS_SOFT, IRQ_VS_TIMER,
> > +       IRQ_PMU_OVF,
> > +};
> > +
> >  static void riscv_intc_irq(struct pt_regs *regs)
> >  {
> > -       unsigned long cause = regs->cause & ~CAUSE_IRQ_FLAG;
> > -
> > -       if (generic_handle_domain_irq(intc_domain, cause))
> > -               pr_warn_ratelimited("Failed to handle interrupt (cause: %ld)\n", cause);
> > +       unsigned long pending = csr_read(CSR_IP) & csr_read(CSR_IE);
> > +       unsigned int i;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(riscv_prio_irqs); i++)
> > +               if (pending & (1UL << riscv_prio_irqs[i]))
> > +                       if (generic_handle_domain_irq(intc_domain, riscv_prio_irqs[i]))
> > +                               pr_warn_ratelimited("Failed to handle interrupt (cause: %u)\n",
> > +                                                   riscv_prio_irqs[i]);
>
> This is overriding the builtin priority assignment for local interrupts as
> defined by the RISC-V privileged specification for non-AIA systems
> which changes the expected behaviour on existing platforms.
>
> >  }
> >
> >  static void riscv_intc_aia_irq(struct pt_regs *regs)
> >  {
> >         unsigned long topi;
> > +       unsigned long pending;
> > +       unsigned int i;
> > +
> > +       while ((topi = csr_read(CSR_TOPI))) {
> > +               pending = csr_read(CSR_IP) & csr_read(CSR_IE);
> >
> > -       while ((topi = csr_read(CSR_TOPI)))
> > -               generic_handle_domain_irq(intc_domain, topi >> TOPI_IID_SHIFT);
> > +               for (i = 0; i < ARRAY_SIZE(riscv_prio_irqs); i++)
> > +                       if (pending & (1UL << riscv_prio_irqs[i]))
> > +                               generic_handle_domain_irq(intc_domain, riscv_prio_irqs[i]);
> > +       }
>
> The AIA specification already provides a mechanism to change
> priorities of local interrupts. The firmware or bootloader can always
> provide custom priorities to local interrupts.
>
> In general, requiring certain local interrupts to have higher priority is
> platform or use-case specific and should be done as AIA configuration
> in firmware or bootloader.
>
> NAK to this patch from my side since it is just adding a software
> work-around for a missing standard watchdog in the RISC-V
> ecosystem.
>
> One possible approach is to let platforms have their implementation
> specific M-mode watchdog and for supervisor software (both HS and
> VS-mode) we can have SBI based watchdog where the supervisor
> watchdog expiry is an SSE (higher priority than all local interrupts).
>
> Regards,
> Anup