lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250423102903.37f9d07c@gandalf.local.home>
Date: Wed, 23 Apr 2025 10:29:03 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Stephen Rothwell <sfr@...b.auug.org.au>
Cc: Libo Chen <libo.chen@...cle.com>, Andrew Morton
 <akpm@...ux-foundation.org>, Linux Kernel Mailing List
 <linux-kernel@...r.kernel.org>, Linux Next Mailing List
 <linux-next@...r.kernel.org>
Subject: Re: linux-next: runtime warning after merge of the mm-unstable tree

On Tue, 22 Apr 2025 19:28:16 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:

> On Wed, 23 Apr 2025 09:16:56 +1000
> Stephen Rothwell <sfr@...b.auug.org.au> wrote:
> 
> > On Tue, 22 Apr 2025 11:05:10 -0700 Libo Chen <libo.chen@...cle.com> wrote:  
> > > 
> > > Just to verify, does the build have commit 'tracing: Verify event
> > > formats that have "%*p.."'[1] or not? I do see linux-next/master has
> > > it but just want to first confirm that.     
> > 
> > Yes, it does have that commit (since it has been in Linus' tree since
> > April 4).  
> 
> Looks like we need to teach the verifier about nodemask_pr_args() :-/
> 

No, actually, the verifier found a bug!

The event is:

> TRACE_EVENT(sched_skip_cpuset_numa,
> 
> 	TP_PROTO(struct task_struct *tsk, nodemask_t *mem_allowed_ptr),
> 
> 	TP_ARGS(tsk, mem_allowed_ptr),
> 
> 	TP_STRUCT__entry(
> 		__array( char,		comm,		TASK_COMM_LEN	)
> 		__field( pid_t,		pid				)
> 		__field( pid_t,		tgid				)
> 		__field( pid_t,		ngid				)
> 		__field( nodemask_t *,	mem_allowed_ptr			)
> 	),
> 
> 	TP_fast_assign(
> 		memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
> 		__entry->pid		 = task_pid_nr(tsk);
> 		__entry->tgid		 = task_tgid_nr(tsk);
> 		__entry->ngid		 = task_numa_group_id(tsk);
> 		__entry->mem_allowed_ptr = mem_allowed_ptr;

Here, we copy the address of mem_allowed_ptr into the ring buffer. The
address is copied, not the mask itself.

> 	),
> 
> 	TP_printk("comm=%s pid=%d tgid=%d ngid=%d mem_nodes_allowed=%*pbl",
> 		  __entry->comm,
> 		  __entry->pid,
> 		  __entry->tgid,
> 		  __entry->ngid,
> 		  nodemask_pr_args(__entry->mem_allowed_ptr))
> );

The above nodemask_ptr_args() is defined as:

> #define nodemask_pr_args(maskp)	__nodemask_pr_numnodes(maskp), \
> 				__nodemask_pr_bits(maskp)
> static __always_inline unsigned int __nodemask_pr_numnodes(const nodemask_t *m)
> {
> 	return m ? MAX_NUMNODES : 0;
> }
> static __always_inline const unsigned long *__nodemask_pr_bits(const nodemask_t *m)
> {
> 	return m ? m->bits : NULL;
> }

Here we see it dereferences the pointer to get bits.

The TP_fast_assign() can happen seconds, minutes, days, months and possibly
years! before the TP_printk() is executed.

There's no guarantee that the mem_allowed_ptr will still be around when
printed and can cause a kernel crash! This is exactly what the verifier is
looking for.

The real fix is not to dereference the pointer and do the conversion in the
TP_fast_assign().

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ