[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52792fdf-e12f-4de5-ad9e-069a0f2fef42@rivosinc.com>
Date: Mon, 22 Sep 2025 10:31:39 +0200
From: Clément Léger <cleger@...osinc.com>
To: yunhui cui <cuiyunhui@...edance.com>
Cc: paul.walmsley@...ive.com, palmer@...belt.com, aou@...s.berkeley.edu,
alex@...ti.fr, conor@...nel.org, atishp@...osinc.com,
ajones@...tanamicro.com, apatel@...tanamicro.com, mchitale@...tanamicro.com,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH RFC 1/1] drivers: firmware: riscv: add
unknown NMI support
On 22/09/2025 10:11, yunhui cui wrote:
> Hi Clément,
>
> On Fri, Sep 19, 2025 at 3:52 PM yunhui cui <cuiyunhui@...edance.com> wrote:
>>
>> Hi Clément,
>>
>> On Fri, Sep 19, 2025 at 3:18 PM Clément Léger <cleger@...osinc.com> wrote:
>>>
>>>
>>>
>>> On 19/09/2025 09:00, Yunhui Cui wrote:
>>>> Unknown NMI can force the kernel to respond (e.g., panic) when the
>>>> system encounters unrecognized critical hardware events, aiding in
>>>> troubleshooting system faults. This is implemented via the Supervisor
>>>> Software Events (SSE) framework.
>>>>
>>>> Signed-off-by: Yunhui Cui <cuiyunhui@...edance.com>
>>>> ---
>>>> arch/riscv/include/asm/sbi.h | 1 +
>>>> drivers/firmware/riscv/Kconfig | 10 +++++
>>>> drivers/firmware/riscv/Makefile | 1 +
>>>> drivers/firmware/riscv/sse_nmi.c | 77 ++++++++++++++++++++++++++++++++
>>>> 4 files changed, 89 insertions(+)
>>>> create mode 100644 drivers/firmware/riscv/sse_nmi.c
>>>>
>>>> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
>>>> index 874cc1d7603a5..5801f90a88f62 100644
>>>> --- a/arch/riscv/include/asm/sbi.h
>>>> +++ b/arch/riscv/include/asm/sbi.h
>>>> @@ -481,6 +481,7 @@ enum sbi_sse_attr_id {
>>>>
>>>> #define SBI_SSE_EVENT_LOCAL_HIGH_PRIO_RAS 0x00000000
>>>> #define SBI_SSE_EVENT_LOCAL_DOUBLE_TRAP 0x00000001
>>>> +#define SBI_SSE_EVENT_LOCAL_UNKNOWN_NMI 0x00000002
>>>
>>> Was this submitted to the PRS WG ? This a specification modification so
>>> it should go through the usual process.
>>>
>>>> #define SBI_SSE_EVENT_GLOBAL_HIGH_PRIO_RAS 0x00008000
>>>> #define SBI_SSE_EVENT_LOCAL_PMU_OVERFLOW 0x00010000
>>>> #define SBI_SSE_EVENT_LOCAL_LOW_PRIO_RAS 0x00100000
>>>> diff --git a/drivers/firmware/riscv/Kconfig b/drivers/firmware/riscv/Kconfig
>>>> index ed5b663ac5f91..746bac862ac46 100644
>>>> --- a/drivers/firmware/riscv/Kconfig
>>>> +++ b/drivers/firmware/riscv/Kconfig
>>>> @@ -12,4 +12,14 @@ config RISCV_SBI_SSE
>>>> this option provides support to register callbacks on specific SSE
>>>> events.
>>>>
>>>> +config RISCV_SSE_UNKNOWN_NMI
>>>> + bool "Enable SBI Supervisor Software Events unknown NMI support"
>>>> + depends on RISCV_SBI_SSE
>>>> + default y
>>>> + help
>>>> + This option enables support for delivering unknown Non-Maskable Interrupt (NMI)
>>>> + notifications via the Supervisor Software Events (SSE) framework. When enabled,
>>>> + unknown NMIs can trigger kernel responses (e.g., panic) for unrecognized critical
>>>> + hardware events, aiding in system fault diagnosis.
>>>> +
>>>> endmenu
>>>> diff --git a/drivers/firmware/riscv/Makefile b/drivers/firmware/riscv/Makefile
>>>> index c8795d4bbb2ea..9242c6cd5e3e9 100644
>>>> --- a/drivers/firmware/riscv/Makefile
>>>> +++ b/drivers/firmware/riscv/Makefile
>>>> @@ -1,3 +1,4 @@
>>>> # SPDX-License-Identifier: GPL-2.0
>>>>
>>>> obj-$(CONFIG_RISCV_SBI_SSE) += riscv_sbi_sse.o
>>>> +obj-$(CONFIG_RISCV_SSE_UNKNOWN_NMI) += sse_nmi.o
>>>> diff --git a/drivers/firmware/riscv/sse_nmi.c b/drivers/firmware/riscv/sse_nmi.c
>>>> new file mode 100644
>>>> index 0000000000000..43063f42efff0
>>>> --- /dev/null
>>>> +++ b/drivers/firmware/riscv/sse_nmi.c
>>>> @@ -0,0 +1,77 @@
>>>> +// SPDX-License-Identifier: GPL-2.0+
>>>> +
>>>> +#include <linux/mm.h>
>>>> +#include <linux/nmi.h>
>>>> +#include <linux/riscv_sbi_sse.h>
>>>> +#include <linux/sched/debug.h>
>>>> +#include <linux/sysctl.h>
>>>> +
>>>> +#include <asm/irq_regs.h>
>>>> +#include <asm/sbi.h>
>>>> +
>>>> +int panic_on_unknown_nmi = 1;
>>>> +struct sse_event *evt;
>>>> +static struct ctl_table_header *unknown_nmi_sysctl_header;
>>>> +
>>>> +const struct ctl_table unknown_nmi_table[] = {
>>>> + {
>>>> + .procname = "panic_enable",
>>>> + .data = &panic_on_unknown_nmi,
>>>> + .maxlen = sizeof(int),
>>>> + .mode = 0644,
>>>> + .proc_handler = proc_dointvec_minmax,
>>>> + .extra1 = SYSCTL_ZERO,
>>>> + .extra2 = SYSCTL_ONE,
>>>> + },
>>>> +};
>>>> +
>>>> +static void nmi_handler(struct pt_regs *regs)
>>>> +{
>>>> + pr_emerg("NMI received for unknown on CPU %d.\n", smp_processor_id());
>>>> +
>>>> + if (panic_on_unknown_nmi)
>>>> + nmi_panic(regs, "NMI: Not continuing");
>>>> +
>>>> + pr_emerg("Dazed and confused, but trying to continue\n");
>>>> +}
>>>
>>> I'm dazed and confused as well ;) What's the point of this except
>>> interrupting the kernel with a panic ? It seems like it's a better idea
>>> to let the firmware handle that properly and display whatever
>>> information are needed. Was your idea to actually force the kernel to
>>> enter in some debug mode ?
>>
>> There is an important scenario: when the kernel becomes unresponsive,
>> we need to trigger an unknown NMI to cause the system to panic() and
>> then collect the vmcore, and such a requirement is common on x86
>> servers.
>>
>>>
>>> Thanks,
>>>
>>> Clément
>>>
>>>> +
>>>> +static int nmi_sse_handler(u32 evt, void *arg, struct pt_regs *regs)
>>>> +{
>>>> + nmi_handler(regs);
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int sse_nmi_init(void)
>>>> +{
>>>> + int ret;
>>>> +
>>>> + evt = sse_event_register(SBI_SSE_EVENT_LOCAL_UNKNOWN_NMI, 0,
>>>> + nmi_sse_handler, NULL);
>
> Should we add this UNKNOWN_NMI event ID in Chapter 17 of the SBI spec?
Hi Yunhui,
If you want it to be part of the spec, that should indeed be submitted
for review.
Thanks,
Clément
>
>
>>>> + if (IS_ERR(evt))
>>>> + return PTR_ERR(evt);
>>>> +
>>>> + ret = sse_event_enable(evt);
>>>> + if (ret) {
>>>> + sse_event_unregister(evt);
>>>> + return ret;
>>>> + }
>>>> +
>>>> + unknown_nmi_sysctl_header = register_sysctl("kernel", unknown_nmi_table);
>>>> + if (!unknown_nmi_sysctl_header) {
>>>> + sse_event_disable(evt);
>>>> + sse_event_unregister(evt);
>>>> + return -ENOMEM;
>>>> + }
>>>> +
>>>> + pr_info("Using SSE for NMI event delivery\n");
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int __init unknow_nmi_init(void)
>>>> +{
>>>> + return sse_nmi_init();
>>>> +}
>>>> +
>>>> +late_initcall(unknow_nmi_init);
>>>
>>
>> Thanks,
>> Yunhui
>
> Thanks,
> Yunhui
Powered by blists - more mailing lists