[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGudoHFiTYi8SeFmRDYH-n=yCxr1-78JMrbVaiGNcfnnHVBRrw@mail.gmail.com>
Date: Tue, 11 Feb 2025 00:08:41 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Kees Cook <kees@...nel.org>
Cc: luto@...capital.net, wad@...omium.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] seccomp: avoid the lock trip in seccomp_filter_release in
common case
On Mon, Feb 10, 2025 at 11:57 PM Kees Cook <kees@...nel.org> wrote:
>
> On Mon, Feb 10, 2025 at 10:05:41PM +0100, Mateusz Guzik wrote:
> > Vast majority of threads don't have any seccomp filters, all while the
> > lock taken here is shared between all threads in given process and
> > frequently used.
> >
> > Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
> > ---
> >
> > Here is a splat from parallel thread creation/destruction within onep
> > rocess:
> >
> > bpftrace -e 'kprobe:__pv_queued_spin_lock_slowpath { @[kstack()] = count(); }'
> >
> > [snip]
> > @[
> > __pv_queued_spin_lock_slowpath+5
> > _raw_spin_lock_irq+42
> > seccomp_filter_release+32
> > do_exit+286
> > __x64_sys_exit+27
> > x64_sys_call+4703
> > do_syscall_64+82
> > entry_SYSCALL_64_after_hwframe+118
> > ]: 475601
> > @[
> > __pv_queued_spin_lock_slowpath+5
> > _raw_spin_lock_irq+42
> > acct_collect+77
> > do_exit+1380
> > __x64_sys_exit+27
> > x64_sys_call+4703
> > do_syscall_64+82
> > entry_SYSCALL_64_after_hwframe+118
> > ]: 478335
> > @[
> > __pv_queued_spin_lock_slowpath+5
> > _raw_spin_lock_irq+42
> > sigprocmask+106
> > __x64_sys_rt_sigprocmask+121
> > do_syscall_64+82
> > entry_SYSCALL_64_after_hwframe+118
> > ]: 1825572
> >
> > There are more spots which take the same lock, with seccomp being top 3.
> >
> > I could not be bothered to bench before/after, but I can do it if you
> > insist. The fact that this codepath is a factor can be seen above.
> >
> > This is a minor patch, I'm not going to insist on it.
> >
> > To my reading seccomp only ever gets populated for current, so this
> > should be perfectly safe to test on exit without any synchronisation.
>
> Unfortunately, this is not true. SECCOMP_FILTER_FLAG_TSYNC may operate
> on non-current. See the paired smp_store_release() and smp_rmb() uses.
>
Ok I see:
630
631 /* Make our new filter tree visible. */
632 smp_store_release(&thread->seccomp.filter,
633 caller->seccomp.filter);
634 atomic_set(&thread->seccomp.filter_count,
635 │ atomic_read(&caller->seccomp.filter_count));
My grep was too naive :)
> > This may need a data_race annotation if some tooling decides to protest.
> >
> > kernel/seccomp.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index 7bbb408431eb..c839674966e2 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -576,6 +576,9 @@ void seccomp_filter_release(struct task_struct *tsk)
> > if (WARN_ON((tsk->flags & PF_EXITING) == 0))
> > return;
> >
> > + if (tsk->seccomp.filter == NULL)
> > + return;
> > +
> > spin_lock_irq(&tsk->sighand->siglock);
> > orig = tsk->seccomp.filter;
> > /* Detach task from its filter tree. */
>
> If this can be made provably race-safe, I'd be fine with the idea, but
> at present, all seccomp.filter manipulation happens under
> sighand->siglock...
>
The loop in seccomp_sync_threads actively skips every thread with the
PF_EXITING flag set, while it is an invariant that any thread passed
to seccomp_filter_release has the flag. So one only needs to ensure
visibility.
The flag gets set in exit_signals().
If the target thread is the only remaining there is no sighand lock
taken, but there is also nobody to execute seccomp_sync_threads.
If there are more threads, the flag gets set under the sighand lock.
Meaning by the time it gets unlocked, any other thread executing
seccomp_sync_threads is going to spot it and refrain from messing with
the exiting thread.
I'm going to need to sleep on it, maybe I'm missing something, but I
think this *does* work as is despite my failed initial assertion of
only current messing with it.
--
Mateusz Guzik <mjguzik gmail.com>
Powered by blists - more mailing lists