lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202502101454.DE164D34@keescook>
Date: Mon, 10 Feb 2025 14:57:52 -0800
From: Kees Cook <kees@...nel.org>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: luto@...capital.net, wad@...omium.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] seccomp: avoid the lock trip in seccomp_filter_release
 in common case

On Mon, Feb 10, 2025 at 10:05:41PM +0100, Mateusz Guzik wrote:
> Vast majority of threads don't have any seccomp filters, all while the
> lock taken here is shared between all threads in given process and
> frequently used.
> 
> Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
> ---
> 
> Here is a splat from parallel thread creation/destruction within onep
> rocess:
> 
> bpftrace -e 'kprobe:__pv_queued_spin_lock_slowpath { @[kstack()] = count(); }'
> 
> [snip]
> @[
>     __pv_queued_spin_lock_slowpath+5
>     _raw_spin_lock_irq+42
>     seccomp_filter_release+32
>     do_exit+286
>     __x64_sys_exit+27
>     x64_sys_call+4703
>     do_syscall_64+82
>     entry_SYSCALL_64_after_hwframe+118
> ]: 475601
> @[
>     __pv_queued_spin_lock_slowpath+5
>     _raw_spin_lock_irq+42
>     acct_collect+77
>     do_exit+1380
>     __x64_sys_exit+27
>     x64_sys_call+4703
>     do_syscall_64+82
>     entry_SYSCALL_64_after_hwframe+118
> ]: 478335
> @[
>     __pv_queued_spin_lock_slowpath+5
>     _raw_spin_lock_irq+42
>     sigprocmask+106
>     __x64_sys_rt_sigprocmask+121
>     do_syscall_64+82
>     entry_SYSCALL_64_after_hwframe+118
> ]: 1825572
> 
> There are more spots which take the same lock, with seccomp being top 3.
> 
> I could not be bothered to bench before/after, but I can do it if you
> insist. The fact that this codepath is a factor can be seen above.
> 
> This is a minor patch, I'm not going to insist on it.
> 
> To my reading seccomp only ever gets populated for current, so this
> should be perfectly safe to test on exit without any synchronisation.

Unfortunately, this is not true. SECCOMP_FILTER_FLAG_TSYNC may operate
on non-current. See the paired smp_store_release() and smp_rmb() uses.

> This may need a data_race annotation if some tooling decides to protest.
> 
>  kernel/seccomp.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 7bbb408431eb..c839674966e2 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -576,6 +576,9 @@ void seccomp_filter_release(struct task_struct *tsk)
>  	if (WARN_ON((tsk->flags & PF_EXITING) == 0))
>  		return;
>  
> +	if (tsk->seccomp.filter == NULL)
> +		return;
> +
>  	spin_lock_irq(&tsk->sighand->siglock);
>  	orig = tsk->seccomp.filter;
>  	/* Detach task from its filter tree. */

If this can be made provably race-safe, I'd be fine with the idea, but
at present, all seccomp.filter manipulation happens under
sighand->siglock...

-Kees

-- 
Kees Cook

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ