lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202007091549.1E457D4@keescook>
Date:   Thu, 9 Jul 2020 16:33:57 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Gabriel Krisman Bertazi <krisman@...labora.com>
Cc:     tglx@...utronix.de, linux-kernel@...r.kernel.org,
        kernel@...labora.com, Matthew Wilcox <willy@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        Paul Gofman <gofmanp@...il.com>
Subject: Re: [PATCH v2] kernel: Implement selective syscall userspace
 redirection

On Thu, Jul 09, 2020 at 12:38:40AM -0400, Gabriel Krisman Bertazi wrote:
> [...]
> +config SYSCALL_USER_DISPATCH
> +	bool "Support rejecting syscalls not coming from a dispatcher"

bike shed: this doesn't really describe why it's useful. Maybe:

	bool "Support syscall redirection to userspace dispatcher"

> +	depends on HAVE_ARCH_SYSCALL_USER_DISPATCH
> +	help
> +	  Enable tasks to ask the kernel to redirect syscalls not
> +	  issued from a predefined dispatcher back to userspace,
> +	  depending on a userspace selector.

	depending on a userspace memory selector.

?

> [...]
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -138,6 +138,11 @@ static long syscall_trace_enter(struct pt_regs *regs)
>  			return -1L;
>  	}
>  
> +	if (work & _TIF_SYSCALL_USER_DISPATCH) {
> +		if (do_syscall_user_dispatch(regs))
> +			return -1L;
> +	}
> +

Nice; I like this! It's very small, and now I want to refactor seccomp
to be so pretty. :)

> [...]
> diff --git a/fs/exec.c b/fs/exec.c
> index e6e8a9a70327..44f0ce352a0d 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1502,6 +1502,8 @@ void setup_new_exec(struct linux_binprm * bprm)
>  
>  	arch_setup_new_exec();
>  
> +	clear_tsk_syscall_user_dispatch(me);
> +

To keep this as arch-agnostic as possible, I actually think this belongs
in begin_new_exec() instead, after the personality clearing. If it were
to be less arch-agnostic, I'd recommend it live in arch_setup_new_exec()
(which most other TIF flags get cleared), but I'd like to have this
feature as a good example of an "easy" refactor into arch-agnostic in
the future. :P

> [...]
> @@ -285,6 +285,7 @@ typedef struct siginfo {
>   */
>  #define SYS_SECCOMP	1	/* seccomp triggered */
>  #define NSIGSYS		1
> +#define SYS_USER_REDIRECT 2

More than just what willy suggested, but you also need to bump NSIGSYS
(I missed that in the RFC). This should read as:

#define SYS_SECCOMP		1	/* seccomp triggered */
#define SYS_USER_REDIRECT	2
#define NSIGSYS			2

(i.e. NSIGSYS is "how many si codes are there for sigsys?")

> [...]
> +int do_syscall_user_dispatch(struct pt_regs *regs)
> +{
> +	int state;
> +
> +	if (current->syscall_dispatch.dispatcher == instruction_pointer(regs))
> +		return 0;

Just to clarify what willy was talking about -- since you're using
"dispatcher" as a scalar (and not dereferencing it, etc), it can just
stay "unsigned long" without __user.

In the documentation for the future "range inclusive" check, maybe also
note that it's the inclusive ranger covering the _starting_ address of
the syscall entry, in case anyone thinks they need to decode instruction
lengths to get the right range, which they don't and I imagine you don't
care about.

> +
> +	if (current->syscall_dispatch.selector) {
> +		if (__get_user(state, current->syscall_dispatch.selector))
> +			do_exit(SIGSEGV);
> +
> +		switch (state) {
> +		case PR_SYSCALL_DISPATCH_DISABLE:
> +			return 0;
> +		case PR_SYSCALL_DISPATCH_ENABLE:
> +			break;
> +		default:
> +			do_exit(SIGSEGV);
> +		}
> +	}
> +
> +	syscall_rollback(current, regs);
> +	trigger_sigsys(regs);
> +
> +	return 1;

How should do_syscall_user_dispatch() approach branch hinting? For example,
is dispatcher == instruction_pointer going to be the common case for
users of this? I would expect not. So maybe, at the top:

	if (unlikely(current->syscall_dispatch.dispatcher == instruction_pointer(regs)))
		return 0;

and what about the selector? I assume it would be common for the selector
to be set, and enabled:

	if (likely(current->syscall_dispatch.selector)) {
		if (__get_user(state, current->syscall_dispatch.selector))
			do_exit(SIGSEGV);

		if (unlikely(state != PR_SYSCALL_DISPATCH_ENABLE)) {
			if (likely(state == PR_SYSCALL_DISPATCH_DISABLE))
				return 0;
			do_exit(SIGSEGV);
		}
	}

	syscall_rollback...

Or, perhaps micro-optimization doesn't matter at all here give the
overhead of signal delivery, and you can just ignore me and leave this
as-is, which is fine too. :)


This continues to look really nice. Very simple, very powerful. I think
I'd like to see one more thing: a selftest! I think it should be very
easy to add; model it after the seccomp selftests:

tools/testing/selftests/seccomp/seccomp_bpf.c

The testing harness there should make it very easy to produce a test,
and it's easy to wire up to the Makefiles. I'm happy to help point you
in the right directions. If you want, you could even share seccomp's
directory, but maybe you want your own. :)

-- 
Kees Cook

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ