lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250619085717.GB1613376@noisy.programming.kicks-ass.net>
Date: Thu, 19 Jun 2025 10:57:17 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
	bpf@...r.kernel.org, x86@...nel.org,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Josh Poimboeuf <jpoimboe@...nel.org>,
	Ingo Molnar <mingo@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
	Namhyung Kim <namhyung@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrii Nakryiko <andrii@...nel.org>,
	Indu Bhagat <indu.bhagat@...cle.com>,
	"Jose E. Marchesi" <jemarch@....org>,
	Beau Belgrave <beaub@...ux.microsoft.com>,
	Jens Remus <jremus@...ux.ibm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v10 07/14] unwind_user/deferred: Make unwind deferral
 requests NMI-safe

On Tue, Jun 10, 2025 at 08:54:28PM -0400, Steven Rostedt wrote:

> +static int unwind_deferred_request_nmi(struct unwind_work *work, u64 *timestamp)
> +{
> +	struct unwind_task_info *info = &current->unwind_info;
> +	bool inited_timestamp = false;
> +	int ret;
> +
> +	/* Always use the nmi_timestamp first */
> +	*timestamp = info->nmi_timestamp ? : info->timestamp;
> +
> +	if (!*timestamp) {
> +		/*
> +		 * This is the first unwind request since the most recent entry
> +		 * from user space. Initialize the task timestamp.
> +		 *
> +		 * Don't write to info->timestamp directly, otherwise it may race
> +		 * with an interruption of get_timestamp().
> +		 */
> +		info->nmi_timestamp = local_clock();
> +		*timestamp = info->nmi_timestamp;
> +		inited_timestamp = true;
> +	}
> +
> +	if (info->pending)
> +		return 1;
> +
> +	ret = task_work_add(current, &info->work, TWA_NMI_CURRENT);
> +	if (ret < 0) {
> +		/*
> +		 * If this set nmi_timestamp and is not using it,
> +		 * there's no guarantee that it will be used.
> +		 * Set it back to zero.
> +		 */
> +		if (inited_timestamp)
> +			info->nmi_timestamp = 0;
> +		return ret;
> +	}
> +
> +	info->pending = 1;
> +
> +	return 0;
> +}
> +
>  /**
>   * unwind_deferred_request - Request a user stacktrace on task exit
>   * @work: Unwind descriptor requesting the trace
> @@ -139,31 +207,38 @@ static void unwind_deferred_task_work(struct callback_head *head)
>  int unwind_deferred_request(struct unwind_work *work, u64 *timestamp)
>  {
>  	struct unwind_task_info *info = &current->unwind_info;
> +	int pending;
>  	int ret;
>  
>  	*timestamp = 0;
>  
>  	if ((current->flags & (PF_KTHREAD | PF_EXITING)) ||
>  	    !user_mode(task_pt_regs(current)))
>  		return -EINVAL;
>  
> +	if (in_nmi())
> +		return unwind_deferred_request_nmi(work, timestamp);

So nested NMI is a thing -- AFAICT this is broken in the face of nested
NMI.

Specifically, we mark all exceptions that can happen with IRQs disabled
as NMI like (so that they don't go about taking locks etc.).

So imagine you're in #DB, you're asking for an unwind, you do the above
dance and get hit with NMI.

Then you get the NMI setting nmi_timestamp, and #DB overwriting it with
a later value, and you're back up the creek without no paddles.


Mix that with local_clock() that is only monotonic on a single CPU. And
you ask for an unwind on CPU0, get migrated to CPU1 which for the
argument will be behind, and see a timestamp 'far' in the future.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ