[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250423102903.37f9d07c@gandalf.local.home>
Date: Wed, 23 Apr 2025 10:29:03 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Stephen Rothwell <sfr@...b.auug.org.au>
Cc: Libo Chen <libo.chen@...cle.com>, Andrew Morton
<akpm@...ux-foundation.org>, Linux Kernel Mailing List
<linux-kernel@...r.kernel.org>, Linux Next Mailing List
<linux-next@...r.kernel.org>
Subject: Re: linux-next: runtime warning after merge of the mm-unstable tree
On Tue, 22 Apr 2025 19:28:16 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:
> On Wed, 23 Apr 2025 09:16:56 +1000
> Stephen Rothwell <sfr@...b.auug.org.au> wrote:
>
> > On Tue, 22 Apr 2025 11:05:10 -0700 Libo Chen <libo.chen@...cle.com> wrote:
> > >
> > > Just to verify, does the build have commit 'tracing: Verify event
> > > formats that have "%*p.."'[1] or not? I do see linux-next/master has
> > > it but just want to first confirm that.
> >
> > Yes, it does have that commit (since it has been in Linus' tree since
> > April 4).
>
> Looks like we need to teach the verifier about nodemask_pr_args() :-/
>
No, actually, the verifier found a bug!
The event is:
> TRACE_EVENT(sched_skip_cpuset_numa,
>
> TP_PROTO(struct task_struct *tsk, nodemask_t *mem_allowed_ptr),
>
> TP_ARGS(tsk, mem_allowed_ptr),
>
> TP_STRUCT__entry(
> __array( char, comm, TASK_COMM_LEN )
> __field( pid_t, pid )
> __field( pid_t, tgid )
> __field( pid_t, ngid )
> __field( nodemask_t *, mem_allowed_ptr )
> ),
>
> TP_fast_assign(
> memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
> __entry->pid = task_pid_nr(tsk);
> __entry->tgid = task_tgid_nr(tsk);
> __entry->ngid = task_numa_group_id(tsk);
> __entry->mem_allowed_ptr = mem_allowed_ptr;
Here, we copy the address of mem_allowed_ptr into the ring buffer. The
address is copied, not the mask itself.
> ),
>
> TP_printk("comm=%s pid=%d tgid=%d ngid=%d mem_nodes_allowed=%*pbl",
> __entry->comm,
> __entry->pid,
> __entry->tgid,
> __entry->ngid,
> nodemask_pr_args(__entry->mem_allowed_ptr))
> );
The above nodemask_ptr_args() is defined as:
> #define nodemask_pr_args(maskp) __nodemask_pr_numnodes(maskp), \
> __nodemask_pr_bits(maskp)
> static __always_inline unsigned int __nodemask_pr_numnodes(const nodemask_t *m)
> {
> return m ? MAX_NUMNODES : 0;
> }
> static __always_inline const unsigned long *__nodemask_pr_bits(const nodemask_t *m)
> {
> return m ? m->bits : NULL;
> }
Here we see it dereferences the pointer to get bits.
The TP_fast_assign() can happen seconds, minutes, days, months and possibly
years! before the TP_printk() is executed.
There's no guarantee that the mem_allowed_ptr will still be around when
printed and can cause a kernel crash! This is exactly what the verifier is
looking for.
The real fix is not to dereference the pointer and do the conversion in the
TP_fast_assign().
-- Steve
Powered by blists - more mailing lists