linux-kernel - Re: [PATCH v4 01/39] task_work: Fix TWA_NMI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250122204720.t42a4em5endxox3y@jpoimboe>
Date: Wed, 22 Jan 2025 12:47:20 -0800
From: Josh Poimboeuf <jpoimboe@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: x86@...nel.org, Steven Rostedt <rostedt@...dmis.org>,
	Ingo Molnar <mingo@...nel.org>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	linux-kernel@...r.kernel.org, Indu Bhagat <indu.bhagat@...cle.com>,
	Mark Rutland <mark.rutland@....com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Jiri Olsa <jolsa@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
	Ian Rogers <irogers@...gle.com>,
	Adrian Hunter <adrian.hunter@...el.com>,
	linux-perf-users@...r.kernel.org, Mark Brown <broonie@...nel.org>,
	linux-toolchains@...r.kernel.org, Jordan Rome <jordalgo@...a.com>,
	Sam James <sam@...too.org>, linux-trace-kernel@...r.kernel.org,
	Andrii Nakryiko <andrii.nakryiko@...il.com>,
	Jens Remus <jremus@...ux.ibm.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Florian Weimer <fweimer@...hat.com>,
	Andy Lutomirski <luto@...nel.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Weinan Liu <wnliu@...gle.com>
Subject: Re: [PATCH v4 01/39] task_work: Fix TWA_NMI_CURRENT error handling

On Wed, Jan 22, 2025 at 01:28:21PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 21, 2025 at 06:30:53PM -0800, Josh Poimboeuf wrote:
> > It's possible for irq_work_queue() to fail if the work has already been
> > claimed.  That can happen if a TWA_NMI_CURRENT task work is requested
> > before a previous TWA_NMI_CURRENT IRQ work on the same CPU has gotten a
> > chance to run.
> 
> I'm confused, if it fails then it's already pending, and we'll get the
> notification already. You can still add the work.

Yeah, I suppose that makes sense.  If the pending irq_work is already
going to set TIF_NOTIFY_RESUME anyway, there's no need to do that again.

> > The error has to be checked before the write to task->task_works.  Also
> > the try_cmpxchg() loop isn't needed in NMI context.  The TWA_NMI_CURRENT
> > case really is special, keep things simple by keeping its code all
> > together in one place.
> 
> NMIs can nest,

Just for my understanding: for nested NMIs, the entry code basically
queues up the next NMI, so the C handler (exc_nmi) can't nest.  Right?

> consider #DB (which is NMI like)

What exactly do you mean by "NMI like"?  Is it because a #DB might be
basically running in NMI context, if the NMI hit a breakpoint?

> doing task_work_add() and getting interrupted with NMI doing the same.

How exactly would that work?  At least with my patch the #DB wouldn't be
able to use TWA_NMI_CURRENT unless in_nmi() were true due to NMI hitting
a breakpoint.  In which case a nested NMI wouldn't actually nest, it
would get "queued" by the entry code.

But yeah, I do see how the reverse can be true: somebody sets a
breakpoint in task_work, right where it's fiddling with the list head.
NMI calls task_work_add(TWA_NMI_CURRENT), triggering the #DB, which also
calls task_work_add().

-- 
Josh