[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aD7ZCnNUqxb9XWNh@e129823.arm.com>
Date: Tue, 3 Jun 2025 12:14:18 +0100
From: Yeoreum Yun <yeoreum.yun@....com>
To: Mark Rutland <mark.rutland@....com>
Cc: Will Deacon <will@...nel.org>, catalin.marinas@....com,
geert@...ux-m68k.org, broonie@...nel.org, mcgrof@...nel.org,
joey.gouly@....com, kristina.martsenko@....com, rppt@...nel.org,
pcc@...gle.com, bigeasy@...utronix.de, ptosi@...gle.com,
james.morse@....com, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, ada.coupriediaz@....com
Subject: Re: [PATCH] arm64/trap: fix broken ct->nmi_nesting when die() is
called in a kthread
Hi Mark,
> On Mon, Jun 02, 2025 at 06:50:53PM +0100, Yeoreum Yun wrote:
> > > One of the reasons we treak BRK as an NMI is that exception entry for
> > > BRK will leave all DAIF bits set, whereas schedule() should be called
> > > with debug and SError unmasked (but IRQ+FIQ masked). Generally, calling
> > > ct_nmi_enter() prevents preemption (and hence calls to schedule()).
> >
> > I think ct_nmi_enter() doesn't prevents preemption but
> > debug_exception_enter() disables preemption.
>
> Yep, sorry for the confusion there. I had erroneously pattern-matched on
> the nmi_nesting values and I had confused that with the similar
> manipulation of the preempt count.
>
> > > Another is that we may have a BUG() or WARN() in entry code where the
> > > task could be in an inconsistent state, and we need to treat the
> > > exception like an NMI to avoid consuming that inconsistent state.
> >
> > So, let's think the "inconsistent" state like:
> > -> el0_enter()
> > -> enter_from_user_mode()
> > -> before update ct_state (context_tracking.state), call BUG()/WARN()
> > -> el1_dbg()
> >
> > It need to call ct_nmi_enter() in el1_dbg() right?
>
> Yes. The critical things are that RCU may not be watching, and all other
> entry accounting may be in an intermediate/inconsistent state, since the
> BUG()/WARN() could be anywhere in that C code. Currently that means we
> must call ct_nmi_enter().
>
> The other problem to bear in mind is that we don't have a way to
> distinguish these BUG()/WARN() cases from others throughout the kernel,
> which is why we currently unconditionally treat this as an NMI entry.
>
> > > To handle that properly, we need to:
> > >
> > > (a) Figure out what to do with entry code. Last I looked I was under the
> > > impression that x86 either didn't have a problem here, or simply
> > > ignored it.
> >
> > TBH, in above case, x86 seems context_traking.state will be broken...
>
> That's certainly possible, that was the impression I had last time I
> looked, but I haven't looked at this in detail for a short while, and I
> may have missed something.
>
> > > (b) Handle BUG/WARN traps separately from other BRKs, such that we can
> > > use local_daif_inherit(), and treat this as a special function call
> > > rather than an NMI.
> > >
> > > (c) Somehow teach make_task_dead() to handle the case where DAIF.D
> > > and/or DAIF.A are set. Most likely we simply have to panic() here,
> > > as with BUG() in interrupt context.
> >
> > Right... It should handle for DAIF.D and DAIF.A bits...
>
> Yes.
>
> [...]
Thanks for clarficiation :D
>
> > > As-is, I think an extra warning in the case of a BUG() is fine given
> > > the larger functional issues.
> > >
> > > I do not think this patch is correct as-is.
> >
> > So, what I think:
> > 1. arm64_enter_el1_dbg() should ct_nmi_enter() as it is.
> > 2. in bug_handler() while handling BUG_TYPE, add above ct_nmi_exit()
> > conditional call.
> > 3. DAIF.D and DAIF.A handling.
>
> No, that is not safe. In step 2, calling ct_nmi_exit() would undo *all*
> of the ct_nmi_enter() logic, and may stop RCU from watching if the
> exception was entered from some intermediate/inconsistent state.
Yes if call ct_nmi_enter() without condition.
But I imply with the condition check what I posted.
if CT_NESTING_IRQ_NONIDLE,
it wouldn't need call and that cpu can be watched by RCU.
>
> If we want to change anything now, it should be the DAIF.DA handling,
> but even for that I'm not sure what the best approach is, and that'll
> require some changes to core code.
>
> Please leave this as-if for now.
>
Not now. But waiting for Ada's patch merged.
and let me talk with you again please.
Thanks for your confirmation again!
--
Sincerely,
Yeoreum Yun
Powered by blists - more mailing lists