lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200225162121.GA9599@lenoir>
Date:   Tue, 25 Feb 2020 17:21:21 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
        rostedt@...dmis.org, mingo@...nel.org, joel@...lfernandes.org,
        gregkh@...uxfoundation.org, gustavo@...eddedor.com,
        tglx@...utronix.de, paulmck@...nel.org, josh@...htriplett.org,
        mathieu.desnoyers@...icios.com, jiangshanlai@...il.com,
        luto@...nel.org, tony.luck@...el.com, dan.carpenter@...cle.com,
        mhiramat@...nel.org, Will Deacon <will@...nel.org>,
        Petr Mladek <pmladek@...e.com>, Marc Zyngier <maz@...nel.org>
Subject: Re: [PATCH v4 02/27] hardirq/nmi: Allow nested nmi_enter()

On Tue, Feb 25, 2020 at 04:41:11PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 25, 2020 at 04:09:06AM +0100, Frederic Weisbecker wrote:
> > On Mon, Feb 24, 2020 at 05:13:18PM +0100, Peter Zijlstra wrote:
> 
> > > +#define arch_nmi_enter()						\
> > > +do {									\
> > > +	struct nmi_ctx *___ctx;						\
> > > +	unsigned int ___cnt;						\
> > > +									\
> > > +	if (!is_kernel_in_hyp_mode() || in_nmi())			\
> > > +		break;							\
> > > +									\
> > > +	___ctx = this_cpu_ptr(&nmi_contexts);				\
> > > +	___cnt = ___ctx->cnt;						\
> > > +	if (!(___cnt & 1) && __cnt) {					\
> > > +		___ctx->cnt += 2;					\
> > > +		break;							\
> > > +	}								\
> > > +									\
> > > +	___ctx->cnt |= 1;						\
> > > +	barrier();							\
> > > +	nmi_ctx->hcr = read_sysreg(hcr_el2);				\
> > > +	if (!(nmi_ctx->hcr & HCR_TGE)) {				\
> > > +		write_sysreg(nmi_ctx->hcr | HCR_TGE, hcr_el2);		\
> > > +		isb();							\
> > > +	}								\
> > > +	barrier();							\
> > 
> > Suppose the first NMI is interrupted here. nmi_ctx->hcr has HCR_TGE unset.
> > The new NMI is going to overwrite nmi_ctx->hcr with HCR_TGE set. Then the
> > first NMI will not restore the correct value upon arch_nmi_exit().
> > 
> > So perhaps the below, but I bet I overlooked something obvious.
> 
> Well, none of this is obvious :/
> 
> The basic idea was that the LSB signifies 'pending/in-progress' and when
> that is set, nobody else touches no nothing. Enter will unconditionally
> (re) write_sysreg(), exit will nothing.
> 
> Obviously I messed that up.
> 
> How's this? 
> 
> #define arch_nmi_enter()						\
> do {									\
> 	struct nmi_ctx *___ctx;						\
> 	unsigned int ___cnt;						\
> 									\
> 	if (!is_kernel_in_hyp_mode() || in_nmi())			\
> 		break;							\
> 									\
> 	___ctx = this_cpu_ptr(&nmi_contexts);				\
> 	___cnt = ___ctx->cnt;						\
> 	if (!(___cnt & 1)) { /* !IN-PROGRESS */				\
> 		if (___cnt) {						\
> 			___ctx->cnt += 2;				\
> 			break;						\
> 		}							\
> 									\
> 		___ctx->hcr = read_sysreg(hcr_el2);			\
> 		barrier();						\
> 		___ctx->cnt |= 1; /* IN-PROGRESS */			\
> 		barrier();						\
> 	}								\
> 									\
> 	if (!(___ctx->hcr & HCR_TGE)) {					\
> 		write_sysreg(___ctx->hcr | HCR_TGE, hcr_el2);		\
> 		isb();							\
> 	}								\
> 	barrier();							\
> 	if (!(___cnt & 1))						\
> 		___ctx->cnt++; /* COMPLETE */				\
> } while (0)
> 
> #define arch_nmi_exit()							\
> do {									\
> 	struct nmi_ctx *___ctx;						\
> 									\
> 	if (!is_kernel_in_hyp_mode() || in_nmi())			\
> 		break;							\
> 									\
> 	___ctx = this_cpu_ptr(&nmi_contexts);				\
> 	if ((___ctx->cnt & 1) || (___ctx->cnt -= 2))			\
> 		break;							\

If you're interrupted here and __ctx->cnt == 0, the new NMI is in its right
to overwrite __ctx->hcr. It will find HCR_TGE set in the sysreg and write it back to
___ctx->hcr. So the following restore will fail.

\
> 	if (!(___ctx->hcr & HCR_TGE))					\
> 		write_sysreg(___ctx->hcr, hcr_el2);			\
> } while (0)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ