linux-kernel - Re: [PATCH v5 3/9] unwind deferred: Use bitmask to determine which callbacks to call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250428125600.1f50f476@gandalf.local.home>
Date: Mon, 28 Apr 2025 12:56:00 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org, Masami
 Hiramatsu <mhiramat@...nel.org>, Mark Rutland <mark.rutland@....com>,
 Andrew Morton <akpm@...ux-foundation.org>, Josh Poimboeuf
 <jpoimboe@...nel.org>, x86@...nel.org, Peter Zijlstra
 <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>, Arnaldo Carvalho de
 Melo <acme@...nel.org>, Indu Bhagat <indu.bhagat@...cle.com>, Alexander
 Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa
 <jolsa@...nel.org>, Namhyung Kim <namhyung@...nel.org>, Ian Rogers
 <irogers@...gle.com>, Adrian Hunter <adrian.hunter@...el.com>,
 linux-perf-users@...r.kernel.org, Mark Brown <broonie@...nel.org>,
 linux-toolchains@...r.kernel.org, Jordan Rome <jordalgo@...a.com>, Sam
 James <sam@...too.org>, Andrii Nakryiko <andrii.nakryiko@...il.com>, Jens
 Remus <jremus@...ux.ibm.com>, Florian Weimer <fweimer@...hat.com>, Andy
 Lutomirski <luto@...nel.org>, Weinan Liu <wnliu@...gle.com>, Blake Jones
 <blakejones@...gle.com>, Beau Belgrave <beaub@...ux.microsoft.com>, "Jose
 E. Marchesi" <jemarch@....org>, Alexander Aring <aahringo@...hat.com>
Subject: Re: [PATCH v5 3/9] unwind deferred: Use bitmask to determine which
 callbacks to call

On Mon, 28 Apr 2025 12:33:50 -0400
Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:

> On 2025-04-24 15:24, Steven Rostedt wrote:
> > From: Steven Rostedt <rostedt@...dmis.org>
> > 
> > In order to know which registered callback requested a stacktrace for when
> > the task goes back to user space, add a bitmask for all registered
> > tracers. The bitmask is the size of log, which means that on a 32 bit  
> 
> size of long
> 

Sure


> > --- a/include/linux/unwind_deferred.h
> > +++ b/include/linux/unwind_deferred.h
> > @@ -13,6 +13,7 @@ typedef void (*unwind_callback_t)(struct unwind_work *work, struct unwind_stackt
> >   struct unwind_work {
> >   	struct list_head		list;
> >   	unwind_callback_t		func;
> > +	int				bit;  
> 
> int or unsigned int ?
> 
> Rename "bit" to "requester_id" ?

Perhaps just "id", as this is only internal and shouldn't be touched.

> 
> >   };
> >   
> >   #ifdef CONFIG_UNWIND_USER
> > diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c
> > index 2afd197da2ef..f505cb1766de 100644
> > --- a/kernel/unwind/deferred.c
> > +++ b/kernel/unwind/deferred.c
> > @@ -26,6 +26,7 @@ static DEFINE_PER_CPU(u64, unwind_ctx_ctr);
> >   /* Guards adding to and reading the list of callbacks */
> >   static DEFINE_MUTEX(callback_mutex);
> >   static LIST_HEAD(callbacks);
> > +static unsigned long unwind_mask;  
> 
> Perhaps "reserved_unwind_mask" ?

Sure.

> 
> >   
> >   /*
> >    * The context cookie is a unique identifier that is assigned to a user
> > @@ -135,6 +136,7 @@ static void unwind_deferred_task_work(struct callback_head *head)
> >   	struct unwind_task_info *info = container_of(head, struct unwind_task_info, work);
> >   	struct unwind_stacktrace trace;
> >   	struct unwind_work *work;
> > +	struct task_struct *task = current;
> >   	u64 cookie;
> >   
> >   	if (WARN_ON_ONCE(!info->pending))
> > @@ -156,7 +158,10 @@ static void unwind_deferred_task_work(struct callback_head *head)
> >   
> >   	guard(mutex)(&callback_mutex);
> >   	list_for_each_entry(work, &callbacks, list) {
> > -		work->func(work, &trace, cookie);
> > +		if (task->unwind_mask & (1UL << work->bit)) {
> > +			work->func(work, &trace, cookie);
> > +			clear_bit(work->bit, &current->unwind_mask);
> > +		}  
> 
> You could change this list of callbacks for an array of pointers,
> indexed by "requester_id".
> 
> Then you can do a for each bit on task->unwind_mask, and all bits
> that match end up calling the callback for the matching array index.

Yeah, I thought of this, but that's just an optimization, and something I
probably will add as a separate patch.

> > @@ -244,14 +254,18 @@ int unwind_deferred_request(struct unwind_work *work, u64 *cookie)
> >   
> >   	*cookie = get_cookie(info);
> >   
> > +	/* This is already queued */
> > +	if (current->unwind_mask & (1UL << work->bit))
> > +		return 0;
> > +
> >   	/* callback already pending? */
> >   	pending = READ_ONCE(info->pending);
> >   	if (pending)
> > -		return 0;
> > +		goto out;
> >   
> >   	/* Claim the work unless an NMI just now swooped in to do so. */
> >   	if (!try_cmpxchg(&info->pending, &pending, 1))  
> 
> Not that it necessarily matters performance wise here, but can this be a
> try_cmpxchg_local if we're working on the task struct and only expecting
> interruption from NMIs ?

Hmm, sure.

> 
> > -		return 0;
> > +		goto out;
> >   
> >   	/* The work has been claimed, now schedule it. */
> >   	ret = task_work_add(current, &info->work, TWA_RESUME);
> > @@ -260,16 +274,29 @@ int unwind_deferred_request(struct unwind_work *work, u64 *cookie)
> >   		return ret;
> >   	}
> >   
> > + out:
> > +	set_bit(work->bit, &current->unwind_mask);
> > +
> >   	return 0;
> >   }
> >   
> >   void unwind_deferred_cancel(struct unwind_work *work)
> >   {
> > +	struct task_struct *g, *t;
> > +
> >   	if (!work)
> >   		return;
> >   
> >   	guard(mutex)(&callback_mutex);
> >   	list_del(&work->list);
> > +
> > +	clear_bit(work->bit, &unwind_mask);
> > +
> > +	guard(rcu)();
> > +	/* Clear this bit from all threads */
> > +	for_each_process_thread(g, t) {
> > +		clear_bit(work->bit, &t->unwind_mask);
> > +	}  
> 
> It is enough to guard with RCU ? See syscall_regfunc() from
> tracepoint.c where we do:
> 
>                  read_lock(&tasklist_lock);
>                  for_each_process_thread(p, t) {
>                          set_task_syscall_work(t, SYSCALL_TRACEPOINT);
>                  }
>                  read_unlock(&tasklist_lock);
> 
> To prevent concurrent fork from adding threads while we
> iterate, thus opening the possibility of missing a clear
> due to a concurrent fork + set bit.

A set_bit only would happen if the callback was live and accepting new
callback requests. It's a bug for a tracer to call unwind_deferred_cancel()
and then call unwind_deferred_request() (which would set the bit). We could
possibly set the tracer's unwind descriptor id to -1, and do an
WARN_ON_ONCE() in unwind_deferred_request() if the tracer's id is negative.

The loop is called under the callback_mutex, where no new tracer could
register and be assigned that bit.

The RCU lock is just to make sure the current tasks that existed when the
loop started doesn't disappear before the loop ends.

-- Steve



> 
> Thanks,
> 
> Mathieu
> 
> >   }
> >   
> >   int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func)
> > @@ -277,6 +304,14 @@ int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func)
> >   	memset(work, 0, sizeof(*work));
> >   
> >   	guard(mutex)(&callback_mutex);
> > +
> > +	/* See if there's a bit in the mask available */
> > +	if (unwind_mask == ~0UL)
> > +		return -EBUSY;
> > +
> > +	work->bit = ffz(unwind_mask);
> > +	unwind_mask |= 1UL << work->bit;
> > +
> >   	list_add(&work->list, &callbacks);
> >   	work->func = func;
> >   	return 0;
> > @@ -288,6 +323,7 @@ void unwind_task_init(struct task_struct *task)
> >   
> >   	memset(info, 0, sizeof(*info));
> >   	init_task_work(&info->work, unwind_deferred_task_work);
> > +	task->unwind_mask = 0;
> >   }
> >   
> >   void unwind_task_free(struct task_struct *task)  
> 
>