[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEEQ3wmDFfmbr0fGxNUOX16p0b5zeR8bKTDTePuCFPYaKOyuqg@mail.gmail.com>
Date: Mon, 22 Sep 2025 19:58:03 +0800
From: yunhui cui <cuiyunhui@...edance.com>
To: Anup Patel <anup@...infault.org>
Cc: Clément Léger <cleger@...osinc.com>,
paul.walmsley@...ive.com, palmer@...belt.com, aou@...s.berkeley.edu,
alex@...ti.fr, conor@...nel.org, atishp@...osinc.com, ajones@...tanamicro.com,
apatel@...tanamicro.com, mchitale@...tanamicro.com,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH RFC 1/1] drivers: firmware: riscv: add
unknown NMI support
Hi Anup,
On Mon, Sep 22, 2025 at 6:40 PM Anup Patel <anup@...infault.org> wrote:
>
> On Mon, Sep 22, 2025 at 1:42 PM yunhui cui <cuiyunhui@...edance.com> wrote:
> >
> > Hi Clément,
> >
> > On Fri, Sep 19, 2025 at 3:52 PM yunhui cui <cuiyunhui@...edance.com> wrote:
> > >
> > > Hi Clément,
> > >
> > > On Fri, Sep 19, 2025 at 3:18 PM Clément Léger <cleger@...osinc.com> wrote:
> > > >
> > > >
> > > >
> > > > On 19/09/2025 09:00, Yunhui Cui wrote:
> > > > > Unknown NMI can force the kernel to respond (e.g., panic) when the
> > > > > system encounters unrecognized critical hardware events, aiding in
> > > > > troubleshooting system faults. This is implemented via the Supervisor
> > > > > Software Events (SSE) framework.
> > > > >
> > > > > Signed-off-by: Yunhui Cui <cuiyunhui@...edance.com>
> > > > > ---
> > > > > arch/riscv/include/asm/sbi.h | 1 +
> > > > > drivers/firmware/riscv/Kconfig | 10 +++++
> > > > > drivers/firmware/riscv/Makefile | 1 +
> > > > > drivers/firmware/riscv/sse_nmi.c | 77 ++++++++++++++++++++++++++++++++
> > > > > 4 files changed, 89 insertions(+)
> > > > > create mode 100644 drivers/firmware/riscv/sse_nmi.c
> > > > >
> > > > > diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> > > > > index 874cc1d7603a5..5801f90a88f62 100644
> > > > > --- a/arch/riscv/include/asm/sbi.h
> > > > > +++ b/arch/riscv/include/asm/sbi.h
> > > > > @@ -481,6 +481,7 @@ enum sbi_sse_attr_id {
> > > > >
> > > > > #define SBI_SSE_EVENT_LOCAL_HIGH_PRIO_RAS 0x00000000
> > > > > #define SBI_SSE_EVENT_LOCAL_DOUBLE_TRAP 0x00000001
> > > > > +#define SBI_SSE_EVENT_LOCAL_UNKNOWN_NMI 0x00000002
> > > >
> > > > Was this submitted to the PRS WG ? This a specification modification so
> > > > it should go through the usual process.
> > > >
> > > > > #define SBI_SSE_EVENT_GLOBAL_HIGH_PRIO_RAS 0x00008000
> > > > > #define SBI_SSE_EVENT_LOCAL_PMU_OVERFLOW 0x00010000
> > > > > #define SBI_SSE_EVENT_LOCAL_LOW_PRIO_RAS 0x00100000
> > > > > diff --git a/drivers/firmware/riscv/Kconfig b/drivers/firmware/riscv/Kconfig
> > > > > index ed5b663ac5f91..746bac862ac46 100644
> > > > > --- a/drivers/firmware/riscv/Kconfig
> > > > > +++ b/drivers/firmware/riscv/Kconfig
> > > > > @@ -12,4 +12,14 @@ config RISCV_SBI_SSE
> > > > > this option provides support to register callbacks on specific SSE
> > > > > events.
> > > > >
> > > > > +config RISCV_SSE_UNKNOWN_NMI
> > > > > + bool "Enable SBI Supervisor Software Events unknown NMI support"
> > > > > + depends on RISCV_SBI_SSE
> > > > > + default y
> > > > > + help
> > > > > + This option enables support for delivering unknown Non-Maskable Interrupt (NMI)
> > > > > + notifications via the Supervisor Software Events (SSE) framework. When enabled,
> > > > > + unknown NMIs can trigger kernel responses (e.g., panic) for unrecognized critical
> > > > > + hardware events, aiding in system fault diagnosis.
> > > > > +
> > > > > endmenu
> > > > > diff --git a/drivers/firmware/riscv/Makefile b/drivers/firmware/riscv/Makefile
> > > > > index c8795d4bbb2ea..9242c6cd5e3e9 100644
> > > > > --- a/drivers/firmware/riscv/Makefile
> > > > > +++ b/drivers/firmware/riscv/Makefile
> > > > > @@ -1,3 +1,4 @@
> > > > > # SPDX-License-Identifier: GPL-2.0
> > > > >
> > > > > obj-$(CONFIG_RISCV_SBI_SSE) += riscv_sbi_sse.o
> > > > > +obj-$(CONFIG_RISCV_SSE_UNKNOWN_NMI) += sse_nmi.o
> > > > > diff --git a/drivers/firmware/riscv/sse_nmi.c b/drivers/firmware/riscv/sse_nmi.c
> > > > > new file mode 100644
> > > > > index 0000000000000..43063f42efff0
> > > > > --- /dev/null
> > > > > +++ b/drivers/firmware/riscv/sse_nmi.c
> > > > > @@ -0,0 +1,77 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0+
> > > > > +
> > > > > +#include <linux/mm.h>
> > > > > +#include <linux/nmi.h>
> > > > > +#include <linux/riscv_sbi_sse.h>
> > > > > +#include <linux/sched/debug.h>
> > > > > +#include <linux/sysctl.h>
> > > > > +
> > > > > +#include <asm/irq_regs.h>
> > > > > +#include <asm/sbi.h>
> > > > > +
> > > > > +int panic_on_unknown_nmi = 1;
> > > > > +struct sse_event *evt;
> > > > > +static struct ctl_table_header *unknown_nmi_sysctl_header;
> > > > > +
> > > > > +const struct ctl_table unknown_nmi_table[] = {
> > > > > + {
> > > > > + .procname = "panic_enable",
> > > > > + .data = &panic_on_unknown_nmi,
> > > > > + .maxlen = sizeof(int),
> > > > > + .mode = 0644,
> > > > > + .proc_handler = proc_dointvec_minmax,
> > > > > + .extra1 = SYSCTL_ZERO,
> > > > > + .extra2 = SYSCTL_ONE,
> > > > > + },
> > > > > +};
> > > > > +
> > > > > +static void nmi_handler(struct pt_regs *regs)
> > > > > +{
> > > > > + pr_emerg("NMI received for unknown on CPU %d.\n", smp_processor_id());
> > > > > +
> > > > > + if (panic_on_unknown_nmi)
> > > > > + nmi_panic(regs, "NMI: Not continuing");
> > > > > +
> > > > > + pr_emerg("Dazed and confused, but trying to continue\n");
> > > > > +}
> > > >
> > > > I'm dazed and confused as well ;) What's the point of this except
> > > > interrupting the kernel with a panic ? It seems like it's a better idea
> > > > to let the firmware handle that properly and display whatever
> > > > information are needed. Was your idea to actually force the kernel to
> > > > enter in some debug mode ?
> > >
> > > There is an important scenario: when the kernel becomes unresponsive,
> > > we need to trigger an unknown NMI to cause the system to panic() and
> > > then collect the vmcore, and such a requirement is common on x86
> > > servers.
> > >
> > > >
> > > > Thanks,
> > > >
> > > > Clément
> > > >
> > > > > +
> > > > > +static int nmi_sse_handler(u32 evt, void *arg, struct pt_regs *regs)
> > > > > +{
> > > > > + nmi_handler(regs);
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +static int sse_nmi_init(void)
> > > > > +{
> > > > > + int ret;
> > > > > +
> > > > > + evt = sse_event_register(SBI_SSE_EVENT_LOCAL_UNKNOWN_NMI, 0,
> > > > > + nmi_sse_handler, NULL);
> >
> > Should we add this UNKNOWN_NMI event ID in Chapter 17 of the SBI spec?
>
> The ratified SBI v3.0 defines a "Software injected local event" (ID 0xffff0000)
> which can be used to inject panic() on a particular HART from another HART.
Has SBI_SSE_EVENT_LOCAL_SOFTWARE (0xffff0000) been given meaning
currently? The scenario of the unknown NMI is that after a HART
receives an interrupt from the platform, M-mode responds to this
interrupt and then enters S-mode to execute the registered handler.
Can SBI_SSE_EVENT_LOCAL_SOFTWARE be used directly?
>
> Regards,
> Anup
Thanks,
Yunhui
Powered by blists - more mailing lists