lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aD7ZCnNUqxb9XWNh@e129823.arm.com>
Date: Tue, 3 Jun 2025 12:14:18 +0100
From: Yeoreum Yun <yeoreum.yun@....com>
To: Mark Rutland <mark.rutland@....com>
Cc: Will Deacon <will@...nel.org>, catalin.marinas@....com,
	geert@...ux-m68k.org, broonie@...nel.org, mcgrof@...nel.org,
	joey.gouly@....com, kristina.martsenko@....com, rppt@...nel.org,
	pcc@...gle.com, bigeasy@...utronix.de, ptosi@...gle.com,
	james.morse@....com, linux-arm-kernel@...ts.infradead.org,
	linux-kernel@...r.kernel.org, ada.coupriediaz@....com
Subject: Re: [PATCH] arm64/trap: fix broken ct->nmi_nesting when die() is
 called in a kthread

Hi Mark,

> On Mon, Jun 02, 2025 at 06:50:53PM +0100, Yeoreum Yun wrote:
> > > One of the reasons we treak BRK as an NMI is that exception entry for
> > > BRK will leave all DAIF bits set, whereas schedule() should be called
> > > with debug and SError unmasked (but IRQ+FIQ masked). Generally, calling
> > > ct_nmi_enter() prevents preemption (and hence calls to schedule()).
> >
> > I think ct_nmi_enter() doesn't prevents preemption but
> > debug_exception_enter() disables preemption.
>
> Yep, sorry for the confusion there. I had erroneously pattern-matched on
> the nmi_nesting values and I had confused that with the similar
> manipulation of the preempt count.
>
> > > Another is that we may have a BUG() or WARN() in entry code where the
> > > task could be in an inconsistent state, and we need to treat the
> > > exception like an NMI to avoid consuming that inconsistent state.
> >
> > So, let's think the "inconsistent" state like:
> >   -> el0_enter()
> > 	  -> enter_from_user_mode()
> > 		  -> before update ct_state (context_tracking.state), call BUG()/WARN()
> > 			  -> el1_dbg()
> >
> > It need to call ct_nmi_enter() in el1_dbg() right?
>
> Yes. The critical things are that RCU may not be watching, and all other
> entry accounting may be in an intermediate/inconsistent state, since the
> BUG()/WARN() could be anywhere in that C code. Currently that means we
> must call ct_nmi_enter().
>
> The other problem to bear in mind is that we don't have a way to
> distinguish these BUG()/WARN() cases from others throughout the kernel,
> which is why we currently unconditionally treat this as an NMI entry.
>
> > > To handle that properly, we need to:
> > >
> > > (a) Figure out what to do with entry code. Last I looked I was under the
> > >     impression that x86 either didn't have a problem here, or simply
> > >     ignored it.
> >
> > TBH, in above case, x86 seems context_traking.state will be broken...
>
> That's certainly possible, that was the impression I had last time I
> looked, but I haven't looked at this in detail for a short while, and I
> may have missed something.
>
> > > (b) Handle BUG/WARN traps separately from other BRKs, such that we can
> > >     use local_daif_inherit(), and treat this as a special function call
> > >     rather than an NMI.
> > >
> > > (c) Somehow teach make_task_dead() to handle the case where DAIF.D
> > >     and/or DAIF.A are set. Most likely we simply have to panic() here,
> > >     as with BUG() in interrupt context.
> >
> > Right... It should handle for DAIF.D and DAIF.A bits...
>
> Yes.
>
> [...]

Thanks for clarficiation :D

>
> > > As-is, I think an extra warning in the case of a BUG() is fine given
> > > the larger functional issues.
> > >
> > > I do not think this patch is correct as-is.
> >
> > So, what I think:
> >   1. arm64_enter_el1_dbg() should ct_nmi_enter() as it is.
> >   2. in bug_handler() while handling BUG_TYPE, add above ct_nmi_exit()
> >      conditional call.
> >   3. DAIF.D and DAIF.A handling.
>
> No, that is not safe. In step 2, calling ct_nmi_exit() would undo *all*
> of the ct_nmi_enter() logic, and may stop RCU from watching if the
> exception was entered from some intermediate/inconsistent state.

Yes if call ct_nmi_enter() without condition.
But I imply with the condition check what I posted.
if CT_NESTING_IRQ_NONIDLE,
it wouldn't need call and that cpu can be watched by RCU.

>
> If we want to change anything now, it should be the DAIF.DA handling,
> but even for that I'm not sure what the best approach is, and that'll
> require some changes to core code.
>
> Please leave this as-if for now.
>

Not now. But waiting for Ada's patch merged.
and let me talk with you again please.

Thanks for your confirmation again!

--
Sincerely,
Yeoreum Yun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ