[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <DEULRAM55E2G.1464D4JSWL3SZ@ventanamicro.com>
Date: Wed, 10 Dec 2025 23:22:29 +0900
From: Radim Krčmář <rkrcmar@...tanamicro.com>
To: "yunhui cui" <cuiyunhui@...edance.com>
Cc: <conor@...nel.org>, <paul.walmsley@...ive.com>, <palmer@...belt.com>,
<aou@...s.berkeley.edu>, <alex@...ti.fr>, <luxu.kernel@...edance.com>,
<linux-kernel@...r.kernel.org>, <linux-riscv@...ts.infradead.org>,
<jassisinghbrar@...il.com>, <conor.dooley@...rochip.com>,
<valentina.fernandezalanis@...rochip.com>, <catalin.marinas@....com>,
<will@...nel.org>, <maz@...nel.org>, <timothy.hayes@....com>,
<lpieralisi@...nel.org>, <arnd@...db.de>, <kees@...nel.org>,
<tglx@...utronix.de>, <viresh.kumar@...aro.org>, <boqun.feng@...il.com>,
<linux-arm-kernel@...ts.infradead.org>, <cleger@...osinc.com>,
<atishp@...osinc.com>, <ajones@...tanamicro.com>, "linux-riscv"
<linux-riscv-bounces@...ts.infradead.org>
Subject: Re: [External] Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
2025-12-08T19:40:39+08:00, yunhui cui <cuiyunhui@...edance.com>:
> Hi Radim,
>
> On Thu, Dec 4, 2025 at 9:16 PM Radim Krčmář <rkrcmar@...tanamicro.com> wrote:
>>
>> 2025-12-04T13:28:45+08:00, yunhui cui <cuiyunhui@...edance.com>:
>> > Hi Radim,
>> >
>> > On Thu, Dec 4, 2025 at 12:07 PM Radim Krčmář <rkrcmar@...tanamicro.com> wrote:
>> >>
>> >> 2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui@...edance.com>:
>> >> > Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
>> >> >
>> >> > Signed-off-by: Yunhui Cui <cuiyunhui@...edance.com>
>> >> > ---
>> >> > diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
>> >> > @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
>> >> > type = atomic_read(this_cpu_ptr(&local_nmi));
>> >> >
>> >> > NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
>> >> > + NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
>> >>
>> >> Please document the intended preemption design for all SSE events,
>> >> because it will be a nightmare if we forget some assumptions in the
>> >> coming years. (That includes the relative priorities of RAS/PMU/...)
>> >
>> > Actually, LOCAL_NMI_CRASH, LOCAL_NMI_STOP, LOCAL_NMI_BACKTRACE,
>> > LOCAL_NMI_KGDB, ... are all implemented via the single SSE event
>> > SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED. Per the SSE design, no
>> > preemption will occur among CRASH, STOP, BACKTRACE, and KGDB events.
>>
>> That is how it is. I don't understand why it must be like that.
>>
>> For example: PMU_OVERFLOW has lower event_id than SOFTWARE_INJECTED, so
>> it will currently interrupt NMI_CRASH as they both have priority 0,
>> although NMI_CRASH probably shouldn't be masked by anything, and should
>> preempt everything.
>> NMI_BACKTRACE, on the other hand, probably shouldn't have that high
>> priority as there seem more important events (e.g. RAS and NMI_CRASH).
>>
>> The issues can be avoided by event priorities, masking, or deemed as
>> non-issue, but I think it would be beneficial to provide some reasoning
>> behind the design, as the choices don't seem obvious to me.
>
> Indeed, it is necessary to consider the priority among different
> events. Should different priorities also be assigned to NMI_CRASH,
> NMI_BACKTRACE, NMI_STOP, and NMI_KGDB?
I think it would be beneficial to document the desired behavior even if
we can't (currently?) implement it, because like you said, SSE can't
directly express the priority when multiplexing SOFTWARE_INJECTED.
> Do these operations need to be
> visible to the BIOS?
BIOS shouldn't care what lower privilege wants to do.
SBI could define more events for software use, though.
> Could you kindly provide some good suggestions?
I think it would be good practice to explicitly set a unique priority
when registering SSE events. Maybe through a global priority enum, and
make sure that all event registrations are passing a value from that
enum.
That would make sure that different events interact like we expect them
to, but it doesn't solve the multiplexing issue of SOFTWARE_INJECTED.
If we're fine with all SOFTWARE_INJECTED sub-handlers having the maximal
priority (higher than RAS/PMU/UNKNOWN_NMI/...), then we could hope that
lower imporance handlers (e.g. BACKTRACE) won't hang, so the higher
importance handlers (e.g. CRASH) would eventually run.
We're dealing with low-occurrence scenarios, so this might be "good
enough for now"...
Situation would get simpler if we could avoid some sub-handlers;
alternatively, it would get more complicated if SOFTWARE_INJECTED had
lower priority than some other event -- we'd make CRASH partially
recover its high priority image by masking other SSE events during its
execution (and we'd need warding amulets against hangs and starvation).
Thanks.
Powered by blists - more mailing lists