lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 2 Feb 2021 11:02:58 -0800
From:   Kees Cook <keescook@...omium.org>
To:     wanghongzhe <wanghongzhe@...wei.com>
Cc:     luto@...capital.net, andrii@...nel.org, ast@...nel.org,
        bpf@...r.kernel.org, daniel@...earbox.net,
        john.fastabend@...il.com, kafai@...com, kpsingh@...nel.org,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        songliubraving@...com, wad@...omium.org, yhs@...com
Subject: Re: [PATCH v1 1/1] Firstly, as Andy mentioned, this should be
 smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread
 situation, and rmb() is a mandatory barrier which should not be used to
 control SMP effects, since mandatory barriers impose unnecessary overhead on
 both SMP and UP systems, as kernel Documentation said.

On Tue, Feb 02, 2021 at 06:13:07PM +0800, wanghongzhe wrote:
> Secondly, the smp_rmb() should be put between reading SYSCALL_WORK_SECCOMP and reading
> seccomp.mode, not between reading seccomp.mode and seccomp->filter, to make
> sure that any changes to mode from another thread have been seen after
> SYSCALL_WORK_SECCOMP was seen, as the original comment shown. This issue seems to be
> misintroduced at 13aa72f0fd0a9f98a41cefb662487269e2f1ad65 which aims to
> refactor the filter callback and the API. So the intuitive solution is to put
> it back like:
> 
> Thirdly, however, we can go further to improve the performace of checking
> syscall, considering that smp_rmb is always executed on the syscall-check
> path at each time for both FILTER and STRICT check while the TSYNC case
> which may lead to race condition is just a rare situation, and that in
> some arch like Arm64 smp_rmb is dsb(ishld) not a cheap barrier() in x86-64.
> 
> As a result, smp_rmb() should only be executed when necessary, e.g, it is
> only necessary when current thread's mode is SECCOMP_MODE_DISABLED at the
> first TYSNCed time, because after that the current thread's mode will always
> be SECCOMP_MODE_FILTER (and SYSCALL_WORK_SECCOMP will always be set) and can not be
> changed anymore by anyone. In other words, after that, any thread can not
> change the mode (and SYSCALL_WORK_SECCOMP), so the race condition disappeared, and
> no more smb_rmb() needed ever.
> 
> So the solution is to read mode again behind smp_rmb() after the mode is seen
> as SECCOMP_MODE_DISABLED by current thread at the first TSYNCed time, and if
> the new mode don't equals to SECCOMP_MODE_FILTER, do BUG(), go to FILTER path
> otherwise.
> 
> RFC -> v1:
>  - replace rmb() with smp_rmb()
>  - move the smp_rmb() logic to the middle between SYSCALL_WORK_SECCOMP and mode
> 
> Signed-off-by: wanghongzhe <wanghongzhe@...wei.com>
> Reviewed-by: Andy Lutomirski <luto@...capital.net>
> ---
>  kernel/seccomp.c | 25 +++++++++++++++++--------
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 952dc1c90229..a621fb913ec6 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -1160,12 +1160,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>  	int data;
>  	struct seccomp_data sd_local;
>  
> -	/*
> -	 * Make sure that any changes to mode from another thread have
> -	 * been seen after SYSCALL_WORK_SECCOMP was seen.
> -	 */
> -	rmb();
> -
>  	if (!sd) {
>  		populate_seccomp_data(&sd_local);
>  		sd = &sd_local;
> @@ -1289,7 +1283,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>  
>  int __secure_computing(const struct seccomp_data *sd)
>  {
> -	int mode = current->seccomp.mode;
>  	int this_syscall;
>  
>  	if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
> @@ -1299,10 +1292,26 @@ int __secure_computing(const struct seccomp_data *sd)
>  	this_syscall = sd ? sd->nr :
>  		syscall_get_nr(current, current_pt_regs());
>  
> -	switch (mode) {
> +	/*
> +	 * Make sure that any changes to mode from another thread have
> +	 * been seen after SYSCALL_WORK_SECCOMP was seen.
> +	 */
> +	smp_rmb();

Let's start with a patch that just replaces rmb() with smp_rmb() and
then work on optimizing. Can you provide performance numbers that show
rmb() (and soon smp_rmb()) is causing actual problems here?

> +
> +	switch (current->seccomp.mode) {
>  	case SECCOMP_MODE_STRICT:
>  		__secure_computing_strict(this_syscall);  /* may call do_exit */
>  		return 0;
> +	/*
> +	 * Make sure that change to mode (from SECCOMP_MODE_DISABLED to
> +	 * SECCOMP_MODE_FILTER) from another thread using TSYNC ability
> +	 * have been seen after SYSCALL_WORK_SECCOMP was seen. Read mode again behind
> +	 * smp_rmb(), if it equals SECCOMP_MODE_FILTER, go to the right path.
> +	 */
> +	case SECCOMP_MODE_DISABLED:
> +		smp_rmb();
> +		if (unlikely(current->seccomp.mode != SECCOMP_MODE_FILTER))
> +			BUG();

BUG() should never be used[1]. This is a recoverable situation, I think, and
should be handled as such.

-Kees

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on

>  	case SECCOMP_MODE_FILTER:
>  		return __seccomp_filter(this_syscall, sd, false);
>  	default:
> -- 
> 2.19.1
> 

-- 
Kees Cook

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ