lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGudoHFiTYi8SeFmRDYH-n=yCxr1-78JMrbVaiGNcfnnHVBRrw@mail.gmail.com>
Date: Tue, 11 Feb 2025 00:08:41 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Kees Cook <kees@...nel.org>
Cc: luto@...capital.net, wad@...omium.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] seccomp: avoid the lock trip in seccomp_filter_release in
 common case

On Mon, Feb 10, 2025 at 11:57 PM Kees Cook <kees@...nel.org> wrote:
>
> On Mon, Feb 10, 2025 at 10:05:41PM +0100, Mateusz Guzik wrote:
> > Vast majority of threads don't have any seccomp filters, all while the
> > lock taken here is shared between all threads in given process and
> > frequently used.
> >
> > Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
> > ---
> >
> > Here is a splat from parallel thread creation/destruction within onep
> > rocess:
> >
> > bpftrace -e 'kprobe:__pv_queued_spin_lock_slowpath { @[kstack()] = count(); }'
> >
> > [snip]
> > @[
> >     __pv_queued_spin_lock_slowpath+5
> >     _raw_spin_lock_irq+42
> >     seccomp_filter_release+32
> >     do_exit+286
> >     __x64_sys_exit+27
> >     x64_sys_call+4703
> >     do_syscall_64+82
> >     entry_SYSCALL_64_after_hwframe+118
> > ]: 475601
> > @[
> >     __pv_queued_spin_lock_slowpath+5
> >     _raw_spin_lock_irq+42
> >     acct_collect+77
> >     do_exit+1380
> >     __x64_sys_exit+27
> >     x64_sys_call+4703
> >     do_syscall_64+82
> >     entry_SYSCALL_64_after_hwframe+118
> > ]: 478335
> > @[
> >     __pv_queued_spin_lock_slowpath+5
> >     _raw_spin_lock_irq+42
> >     sigprocmask+106
> >     __x64_sys_rt_sigprocmask+121
> >     do_syscall_64+82
> >     entry_SYSCALL_64_after_hwframe+118
> > ]: 1825572
> >
> > There are more spots which take the same lock, with seccomp being top 3.
> >
> > I could not be bothered to bench before/after, but I can do it if you
> > insist. The fact that this codepath is a factor can be seen above.
> >
> > This is a minor patch, I'm not going to insist on it.
> >
> > To my reading seccomp only ever gets populated for current, so this
> > should be perfectly safe to test on exit without any synchronisation.
>
> Unfortunately, this is not true. SECCOMP_FILTER_FLAG_TSYNC may operate
> on non-current. See the paired smp_store_release() and smp_rmb() uses.
>

Ok I see:

  630
   631                 /* Make our new filter tree visible. */
   632                 smp_store_release(&thread->seccomp.filter,
   633                                   caller->seccomp.filter);
   634                 atomic_set(&thread->seccomp.filter_count,
   635                 │          atomic_read(&caller->seccomp.filter_count));

My grep was too naive :)

> > This may need a data_race annotation if some tooling decides to protest.
> >
> >  kernel/seccomp.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index 7bbb408431eb..c839674966e2 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -576,6 +576,9 @@ void seccomp_filter_release(struct task_struct *tsk)
> >       if (WARN_ON((tsk->flags & PF_EXITING) == 0))
> >               return;
> >
> > +     if (tsk->seccomp.filter == NULL)
> > +             return;
> > +
> >       spin_lock_irq(&tsk->sighand->siglock);
> >       orig = tsk->seccomp.filter;
> >       /* Detach task from its filter tree. */
>
> If this can be made provably race-safe, I'd be fine with the idea, but
> at present, all seccomp.filter manipulation happens under
> sighand->siglock...
>

The loop in seccomp_sync_threads actively skips every thread with the
PF_EXITING flag set, while it is an invariant that any thread passed
to seccomp_filter_release has the flag. So one only needs to ensure
visibility.

The flag gets set in exit_signals().

If the target thread is the only remaining there is no sighand lock
taken, but there is also nobody to execute seccomp_sync_threads.

If there are more threads, the flag gets set under the sighand lock.
Meaning by the time it gets unlocked, any other thread executing
seccomp_sync_threads is going to spot it and refrain from messing with
the exiting thread.

I'm going to need to sleep on it, maybe I'm missing something, but I
think this *does* work as is despite my failed initial assertion of
only current messing with it.

-- 
Mateusz Guzik <mjguzik gmail.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ