linux-kernel - Re: [PATCH v6 3/3] Documentation: prctl/seccomp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F271DFE.3080202@linux.vnet.ibm.com>
Date:	Mon, 30 Jan 2012 17:47:26 -0500
From:	Corey Bryant <coreyb@...ux.vnet.ibm.com>
To:	Will Drewry <wad@...omium.org>
CC:	linux-kernel@...r.kernel.org, keescook@...omium.org,
	john.johansen@...onical.com, serge.hallyn@...onical.com,
	pmoore@...hat.com, eparis@...hat.com, djm@...drot.org,
	torvalds@...ux-foundation.org, segoon@...nwall.com,
	rostedt@...dmis.org, jmorris@...ei.org, scarybeasts@...il.com,
	avi@...hat.com, penberg@...helsinki.fi, viro@...iv.linux.org.uk,
	luto@....edu, mingo@...e.hu, akpm@...ux-foundation.org,
	khilman@...com, borislav.petkov@....com, amwang@...hat.com,
	oleg@...hat.com, ak@...ux.intel.com, eric.dumazet@...il.com,
	gregkh@...e.de, dhowells@...hat.com, daniel.lezcano@...e.fr,
	linux-fsdevel@...r.kernel.org,
	linux-security-module@...r.kernel.org, olofj@...omium.org,
	mhalcrow@...gle.com, dlaor@...hat.com, corbet@....net,
	alan@...rguk.ukuu.org.uk, indan@....nu, mcgrathr@...omium.org
Subject: Re: [PATCH v6 3/3] Documentation: prctl/seccomp_filter



On 01/28/2012 05:11 PM, Will Drewry wrote:
> Documents how system call filtering using Berkeley Packet
> Filter programs works and how it may be used.
> Includes an example for x86 (32-bit) and a semi-generic
> example using an example code generator.
>
> v6: - tweak the language to note the requirement of
>        PR_SET_NO_NEW_PRIVS being called prior to use. (luto@....edu)
> v5: - update sample to use system call arguments
>      - adds a "fancy" example using a macro-based generator
>      - cleaned up bpf in the sample
>      - update docs to mention arguments
>      - fix prctl value (eparis@...hat.com)
>      - language cleanup (rdunlap@...otime.net)
> v4: - update for no_new_privs use
>      - minor tweaks
> v3: - call out BPF<->  Berkeley Packet Filter (rdunlap@...otime.net)
>      - document use of tentative always-unprivileged
>      - guard sample compilation for i386 and x86_64
> v2: - move code to samples (corbet@....net)
>
> Signed-off-by: Will Drewry<wad@...omium.org>
> ---
>   Documentation/prctl/seccomp_filter.txt |  100 +++++++++++++++
>   samples/Makefile                       |    2 +-
>   samples/seccomp/Makefile               |   27 ++++
>   samples/seccomp/bpf-direct.c           |   77 +++++++++++
>   samples/seccomp/bpf-fancy.c            |   95 ++++++++++++++
>   samples/seccomp/bpf-helper.c           |   89 +++++++++++++
>   samples/seccomp/bpf-helper.h           |  219 ++++++++++++++++++++++++++++++++
>   7 files changed, 608 insertions(+), 1 deletions(-)
>   create mode 100644 Documentation/prctl/seccomp_filter.txt
>   create mode 100644 samples/seccomp/Makefile
>   create mode 100644 samples/seccomp/bpf-direct.c
>   create mode 100644 samples/seccomp/bpf-fancy.c
>   create mode 100644 samples/seccomp/bpf-helper.c
>   create mode 100644 samples/seccomp/bpf-helper.h
>
> diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt
> new file mode 100644
> index 0000000..4ad7649
> --- /dev/null
> +++ b/Documentation/prctl/seccomp_filter.txt
> @@ -0,0 +1,100 @@
> +		Seccomp filtering
> +		=================
> +
> +Introduction
> +------------
> +
> +A large number of system calls are exposed to every userland process
> +with many of them going unused for the entire lifetime of the process.
> +As system calls change and mature, bugs are found and eradicated.  A
> +certain subset of userland applications benefit by having a reduced set
> +of available system calls.  The resulting set reduces the total kernel
> +surface exposed to the application.  System call filtering is meant for
> +use with those applications.
> +
> +Seccomp filtering provides a means for a process to specify a filter for
> +incoming system calls.  The filter is expressed as a Berkeley Packet
> +Filter (BPF) program, as with socket filters, except that the data
> +operated on is related to the system call being made: system call
> +number, and the system call arguments.  This allows for expressive
> +filtering of system calls using a filter program language with a long
> +history of being exposed to userland and a straightforward data set.
> +
> +Additionally, BPF makes it impossible for users of seccomp to fall prey
> +to time-of-check-time-of-use (TOCTOU) attacks that are common in system
> +call interposition frameworks.  BPF programs may not dereference
> +pointers which constrains all filters to solely evaluating the system
> +call arguments directly.
> +
> +What it isn't
> +-------------
> +
> +System call filtering isn't a sandbox.  It provides a clearly defined
> +mechanism for minimizing the exposed kernel surface.  Beyond that,
> +policy for logical behavior and information flow should be managed with
> +a combination of other system hardening techniques and, potentially, an
> +LSM of your choosing.  Expressive, dynamic filters provide further options down
> +this path (avoiding pathological sizes or selecting which of the multiplexed
> +system calls in socketcall() is allowed, for instance) which could be
> +construed, incorrectly, as a more complete sandboxing solution.
> +
> +Usage
> +-----
> +
> +An additional seccomp mode is added, but they are not directly set by
> +the consuming process.  The new mode, '2', is only available if
> +CONFIG_SECCOMP_FILTER is set and enabled using prctl with the
> +PR_ATTACH_SECCOMP_FILTER argument.
> +
> +Interacting with seccomp filters is done using one prctl(2) call.
> +
> +PR_ATTACH_SECCOMP_FILTER:
> +	Allows the specification of a new filter using a BPF program.
> +	The BPF program will be executed over struct seccomp_filter_data
> +	reflecting the system call number, arguments, and other
> +	metadata, To allow a system call, SECCOMP_BPF_ALLOW must be
> +	returned.  At present, all other return values result in the
> +	system call being blocked, but it is recommended to return
> +	SECCOMP_BPF_DENY in those cases.  This will allow for future
> +	custom return values to be introduced, if ever desired.
> +
> +	Usage:
> +		prctl(PR_ATTACH_SECCOMP_FILTER, prog);
> +
> +	The 'prog' argument is a pointer to a struct sock_fprog which will
> +	contain the filter program.  If the program is invalid, the call
> +	will return -1 and set errno to EINVAL.
> +
> +	Note, is_compat_task is also tracked for the @prog.  This means
> +	that once set the calling task will have all of its system calls
> +	blocked if it switches its system call ABI.
> +
> +	If fork/clone and execve are allowed by @prog, any child processes will
> +	be constrained to the same filters and system call ABI as the parent.
> +
> +	Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or
> +	run with CAP_SYS_ADMIN privileges in its namespace.  If these are not
> +	true, -EACCES will be returned.  This requirement ensures that filter
> +	programs cannot be applied to child processes with greater privileges
> +	than the task that installed them.
> +
> +	Additionally, if prctl(2) is allowed by the attached filter,
> +	additional filters may be layered on which will increase evaluation
> +	time, but allow for further decreasing the attack surface during
> +	execution of a process.
> +
> +The above call returns 0 on success and non-zero on error.
> +
> +Example
> +-------
> +
> +The samples/seccomp/ directory contains both a 32-bit specific example
> +and a more generic example of a higher level macro interface for BPF
> +program generation.
> +
> +Adding architecture support
> +-----------------------
> +
> +Any platform with seccomp support will support seccomp filters as long
> +as CONFIG_SECCOMP_FILTER is enabled and the architecture has implemented
> +syscall_get_arguments.
> diff --git a/samples/Makefile b/samples/Makefile
> index 6280817..f29b19c 100644
> --- a/samples/Makefile
> +++ b/samples/Makefile
> @@ -1,4 +1,4 @@
>   # Makefile for Linux samples code
>
>   obj-$(CONFIG_SAMPLES)	+= kobject/ kprobes/ tracepoints/ trace_events/ \
> -			   hw_breakpoint/ kfifo/ kdb/ hidraw/
> +			   hw_breakpoint/ kfifo/ kdb/ hidraw/ seccomp/
> diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile
> new file mode 100644
> index 0000000..0298c6f
> --- /dev/null
> +++ b/samples/seccomp/Makefile
> @@ -0,0 +1,27 @@
> +# kbuild trick to avoid linker error. Can be omitted if a module is built.
> +obj- := dummy.o
> +
> +hostprogs-y := bpf-fancy
> +bpf-fancy-objs := bpf-fancy.o bpf-helper.o
> +
> +HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include
> +HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include
> +HOSTCFLAGS_bpf-helper.o += -I$(objtree)/usr/include
> +HOSTCFLAGS_bpf-helper.o += -idirafter $(objtree)/include
> +
> +# bpf-direct.c is x86-only.
> +ifeq ($(filter-out x86_64 i386,$(KBUILD_BUILDHOST)),)
> +# List of programs to build
> +hostprogs-y += bpf-direct
> +bpf-direct-objs := bpf-direct.o
> +endif
> +
> +# Tell kbuild to always build the programs
> +always := $(hostprogs-y)
> +
> +HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include
> +HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include
> +ifeq ($(KBUILD_BUILDHOST),x86_64)
> +HOSTCFLAGS_bpf-direct.o += -m32
> +HOSTLOADLIBES_bpf-direct += -m32
> +endif
> diff --git a/samples/seccomp/bpf-direct.c b/samples/seccomp/bpf-direct.c
> new file mode 100644
> index 0000000..d799244
> --- /dev/null
> +++ b/samples/seccomp/bpf-direct.c
> @@ -0,0 +1,77 @@
> +/*
> + * 32-bit seccomp filter example with BPF macros
> + *
> + * Copyright (c) 2012 The Chromium OS Authors<chromium-os-dev@...omium.org>
> + * Author: Will Drewry<wad@...omium.org>
> + *
> + * The code may be used by anyone for any purpose,
> + * and can serve as a starting point for developing
> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
> + */
> +
> +#include<linux/filter.h>
> +#include<linux/ptrace.h>
> +#include<linux/seccomp_filter.h>
> +#include<linux/unistd.h>
> +#include<stdio.h>
> +#include<stddef.h>
> +#include<sys/prctl.h>
> +#include<unistd.h>
> +
> +#ifndef PR_ATTACH_SECCOMP_FILTER
> +#	define PR_ATTACH_SECCOMP_FILTER 37
> +#endif
> +
> +#define syscall_arg(_n) (offsetof(struct seccomp_filter_data, args[_n].lo32))
> +#define nr (offsetof(struct seccomp_filter_data, syscall_nr))
> +
> +static int install_filter(void)
> +{
> +	struct seccomp_filter_block filter[] = {
> +		/* Grab the system call number */
> +		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, nr),
> +		/* Jump table for the allowed syscalls */
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 10, 0),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 9, 0),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 8, 0),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 7, 0),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 2, 6),
> +
> +		/* Check that read is only using stdin. */
> +		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDIN_FILENO, 3, 4),
> +
> +		/* Check that write is only using stdout/stderr */
> +		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDOUT_FILENO, 1, 0),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDERR_FILENO, 0, 1),
> +
> +		BPF_STMT(BPF_RET+BPF_K, SECCOMP_BPF_ALLOW),
> +		BPF_STMT(BPF_RET+BPF_K, SECCOMP_BPF_DENY),
> +	};
> +	struct seccomp_fprog prog = {
> +		.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
> +		.filter = filter,
> +	};
> +	if (prctl(PR_ATTACH_SECCOMP_FILTER,&prog)) {
> +		perror("prctl");
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +#define payload(_c) (_c), sizeof((_c))
> +int main(int argc, char **argv)
> +{
> +	char buf[4096];
> +	ssize_t bytes = 0;
> +	if (install_filter())
> +		return 1;
> +	syscall(__NR_write, STDOUT_FILENO,
> +		payload("OHAI! WHAT IS YOUR NAME? "));
> +	bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf));
> +	syscall(__NR_write, STDOUT_FILENO, payload("HELLO, "));
> +	syscall(__NR_write, STDOUT_FILENO, buf, bytes);
> +	return 0;
> +}
> diff --git a/samples/seccomp/bpf-fancy.c b/samples/seccomp/bpf-fancy.c
> new file mode 100644
> index 0000000..1318b1a
> --- /dev/null
> +++ b/samples/seccomp/bpf-fancy.c
> @@ -0,0 +1,95 @@
> +/*
> + * Seccomp BPF example using a macro-based generator.
> + *
> + * Copyright (c) 2012 The Chromium OS Authors<chromium-os-dev@...omium.org>
> + * Author: Will Drewry<wad@...omium.org>
> + *
> + * The code may be used by anyone for any purpose,
> + * and can serve as a starting point for developing
> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
> + */
> +
> +#include<linux/seccomp_filter.h>
> +#include<linux/unistd.h>
> +#include<stdio.h>
> +#include<string.h>
> +#include<sys/prctl.h>
> +#include<unistd.h>
> +
> +#include "bpf-helper.h"
> +
> +#ifndef PR_ATTACH_SECCOMP_FILTER
> +#	define PR_ATTACH_SECCOMP_FILTER 37
> +#endif
> +
> +int main(int argc, char **argv)
> +{
> +	struct bpf_labels l;
> +	static const char msg1[] = "Please type something: ";
> +	static const char msg2[] = "You typed: ";
> +	char buf[256];
> +	struct seccomp_filter_block filter[] = {
> +		LOAD_SYSCALL_NR,
> +		SYSCALL(__NR_exit, ALLOW),
> +		SYSCALL(__NR_exit_group, ALLOW),
> +		SYSCALL(__NR_write, JUMP(&l, write_fd)),
> +		SYSCALL(__NR_read, JUMP(&l, read)),
> +		DENY,  /* Don't passthrough into a label */
> +
> +		LABEL(&l, read),
> +		ARG(0),
> +		JNE(STDIN_FILENO, DENY),
> +		ARG(1),
> +		JNE((unsigned long)buf, DENY),
> +		ARG(2),
> +		JGE(sizeof(buf), DENY),
> +		ALLOW,
> +
> +		LABEL(&l, write_fd),
> +		ARG(0),
> +		JEQ(STDOUT_FILENO, JUMP(&l, write_buf)),
> +		JEQ(STDERR_FILENO, JUMP(&l, write_buf)),
> +		DENY,
> +
> +		LABEL(&l, write_buf),
> +		ARG(1),
> +		JEQ((unsigned long)msg1, JUMP(&l, msg1_len)),
> +		JEQ((unsigned long)msg2, JUMP(&l, msg2_len)),
> +		JEQ((unsigned long)buf, JUMP(&l, buf_len)),
> +		DENY,
> +
> +		LABEL(&l, msg1_len),
> +		ARG(2),
> +		JLT(sizeof(msg1), ALLOW),
> +		DENY,
> +
> +		LABEL(&l, msg2_len),
> +		ARG(2),
> +		JLT(sizeof(msg2), ALLOW),
> +		DENY,
> +
> +		LABEL(&l, buf_len),
> +		ARG(2),
> +		JLT(sizeof(buf), ALLOW),
> +		DENY,
> +	};
> +	struct seccomp_fprog prog = {
> +		.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
> +		.filter = filter,
> +	};
> +	ssize_t bytes;
> +	bpf_resolve_jumps(&l, filter, sizeof(filter)/sizeof(*filter));
> +
> +	if (prctl(PR_ATTACH_SECCOMP_FILTER,&prog)) {
> +		perror("prctl");
> +		return 1;
> +	}
> +	syscall(__NR_write, STDOUT_FILENO, msg1, strlen(msg1));
> +	bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)-1);
> +	bytes = (bytes>  0 ? bytes : 0);
> +	syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2));
> +	syscall(__NR_write, STDERR_FILENO, buf, bytes);
> +	/* Now get killed */
> +	syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)+2);
> +	return 0;
> +}
> diff --git a/samples/seccomp/bpf-helper.c b/samples/seccomp/bpf-helper.c
> new file mode 100644
> index 0000000..e1b6bc7
> --- /dev/null
> +++ b/samples/seccomp/bpf-helper.c
> @@ -0,0 +1,89 @@
> +/*
> + * Seccomp BPF helper functions
> + *
> + * Copyright (c) 2012 The Chromium OS Authors<chromium-os-dev@...omium.org>
> + * Author: Will Drewry<wad@...omium.org>
> + *
> + * The code may be used by anyone for any purpose,
> + * and can serve as a starting point for developing
> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
> + */
> +
> +#include<stdio.h>
> +#include<string.h>
> +
> +#include "bpf-helper.h"
> +
> +int bpf_resolve_jumps(struct bpf_labels *labels,
> +		      struct seccomp_filter_block *filter, size_t count)
> +{
> +	struct seccomp_filter_block *begin = filter;
> +	__u8 insn = count - 1;
> +
> +	if (count<  1)
> +		return -1;
> +	/*
> +	* Walk it once, backwards, to build the label table and do fixups.
> +	* Since backward jumps are disallowed by BPF, this is easy.
> +	*/
> +	filter += insn;
> +	for (; filter>= begin; --insn, --filter) {
> +		if (filter->code != (BPF_JMP+BPF_JA))
> +			continue;
> +		switch ((filter->jt<<8)|filter->jf) {
> +		case (JUMP_JT<<8)|JUMP_JF:
> +			if (labels->labels[filter->k].location == 0xffffffff) {
> +				fprintf(stderr, "Unresolved label: '%s'\n",
> +					labels->labels[filter->k].label);
> +				return 1;
> +			}
> +			filter->k = labels->labels[filter->k].location -
> +				    (insn + 1);
> +			filter->jt = 0;
> +			filter->jf = 0;
> +			continue;
> +		case (LABEL_JT<<8)|LABEL_JF:
> +			if (labels->labels[filter->k].location != 0xffffffff) {
> +				fprintf(stderr, "Duplicate label use: '%s'\n",
> +					labels->labels[filter->k].label);
> +				return 1;
> +			}
> +			labels->labels[filter->k].location = insn;
> +			filter->k = 0; /* fall through */
> +			filter->jt = 0;
> +			filter->jf = 0;
> +			continue;
> +		}
> +	}
> +	return 0;
> +}
> +
> +/* Simple lookup table for labels. */
> +__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label)
> +{
> +	struct __bpf_label *begin = labels->labels, *end;
> +	int id;
> +	if (labels->count == 0) {
> +		begin->label = label;
> +		begin->location = 0xffffffff;
> +		labels->count++;
> +		return 0;
> +	}
> +	end = begin + labels->count;
> +	for (id = 0; begin<  end; ++begin, ++id) {
> +		if (!strcmp(label, begin->label))
> +			return id;
> +	}
> +	begin->label = label;
> +	begin->location = 0xffffffff;
> +	labels->count++;
> +	return id;
> +}
> +
> +void seccomp_bpf_print(struct seccomp_filter_block *filter, size_t count)
> +{
> +	struct seccomp_filter_block *end = filter + count;
> +	for ( ; filter<  end; ++filter)
> +		printf("{ code=%u,jt=%u,jf=%u,k=%u },\n",
> +			filter->code, filter->jt, filter->jf, filter->k);
> +}
> diff --git a/samples/seccomp/bpf-helper.h b/samples/seccomp/bpf-helper.h
> new file mode 100644
> index 0000000..92b94ec
> --- /dev/null
> +++ b/samples/seccomp/bpf-helper.h
> @@ -0,0 +1,219 @@
> +/*
> + * Example wrapper around BPF macros.
> + *
> + * Copyright (c) 2012 The Chromium OS Authors<chromium-os-dev@...omium.org>
> + * Author: Will Drewry<wad@...omium.org>
> + *
> + * The code may be used by anyone for any purpose,
> + * and can serve as a starting point for developing
> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
> + *
> + * No guarantees are provided with respect to the correctness
> + * or functionality of this code.
> + */
> +#ifndef __BPF_HELPER_H__
> +#define __BPF_HELPER_H__
> +
> +#include<asm/bitsperlong.h>	/* for __BITS_PER_LONG */
> +#include<linux/filter.h>
> +#include<linux/seccomp_filter.h>	/* for seccomp_filter_data.arg */
> +#include<linux/types.h>
> +#include<linux/unistd.h>
> +#include<stddef.h>
> +
> +#define BPF_LABELS_MAX 256
> +struct bpf_labels {
> +	int count;
> +	struct __bpf_label {
> +		const char *label;
> +		__u32 location;
> +	} labels[BPF_LABELS_MAX];
> +};
> +
> +int bpf_resolve_jumps(struct bpf_labels *labels,
> +		      struct seccomp_filter_block *filter, size_t count);
> +__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label);
> +void seccomp_bpf_print(struct seccomp_filter_block *filter, size_t count);
> +
> +#define JUMP_JT 0xff
> +#define JUMP_JF 0xff
> +#define LABEL_JT 0xfe
> +#define LABEL_JF 0xfe
> +
> +#define ALLOW \
> +	BPF_STMT(BPF_RET+BPF_K, 0xFFFFFFFF)
> +#define DENY \
> +	BPF_STMT(BPF_RET+BPF_K, 0)
> +#define JUMP(labels, label) \
> +	BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \
> +		 JUMP_JT, JUMP_JF)
> +#define LABEL(labels, label) \
> +	BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \
> +		 LABEL_JT, LABEL_JF)
> +#define SYSCALL(nr, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (nr), 0, 1), \
> +	jt
> +
> +/* Lame, but just an example */
> +#define FIND_LABEL(labels, label) seccomp_bpf_label((labels), #label)
> +
> +#define EXPAND(...) __VA_ARGS__
> +/* Map all width-sensitive operations */
> +#if __BITS_PER_LONG == 32
> +
> +#define JEQ(x, jt) JEQ32(x, EXPAND(jt))
> +#define JNE(x, jt) JNE32(x, EXPAND(jt))
> +#define JGT(x, jt) JGT32(x, EXPAND(jt))
> +#define JLT(x, jt) JLT32(x, EXPAND(jt))
> +#define JGE(x, jt) JGE32(x, EXPAND(jt))
> +#define JLE(x, jt) JLE32(x, EXPAND(jt))
> +#define JA(x, jt) JA32(x, EXPAND(jt))
> +#define ARG(i) ARG_32(i)
> +
> +#elif __BITS_PER_LONG == 64
> +
> +#define JEQ(x, jt) \
> +	JEQ64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	      ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	      EXPAND(jt))
> +#define JGT(x, jt) \
> +	JGT64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	      ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	      EXPAND(jt))
> +#define JGE(x, jt) \
> +	JGE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	      ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	      EXPAND(jt))
> +#define JNE(x, jt) \
> +	JNE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	      ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	      EXPAND(jt))
> +#define JLT(x, jt) \
> +	JLT64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	      ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	      EXPAND(jt))
> +#define JLE(x, jt) \
> +	JLE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	      ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	      EXPAND(jt))
> +
> +#define JA(x, jt) \
> +	JA64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	       ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	       EXPAND(jt))
> +#define ARG(i) ARG_64(i)
> +
> +#else
> +#error __BITS_PER_LONG value unusable.
> +#endif
> +
> +/* Loads the arg into A */
> +#define ARG_32(idx) \
> +	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \
> +		offsetof(struct seccomp_filter_data, args[(idx)].lo32))
> +
> +/* Loads hi into A and lo in X */
> +#define ARG_64(idx) \
> +	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \
> +	  offsetof(struct seccomp_filter_data, args[(idx)].lo32)), \
> +	BPF_STMT(BPF_ST, 0), /* lo ->  M[0] */ \
> +	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \
> +	  offsetof(struct seccomp_filter_data, args[(idx)].hi32)), \
> +	BPF_STMT(BPF_ST, 1) /* hi ->  M[1] */
> +
> +#define JEQ32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 0, 1), \
> +	jt
> +
> +#define JNE32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 1, 0), \
> +	jt
> +
> +/* Checks the lo, then swaps to check the hi. A=lo,X=hi */
> +#define JEQ64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 0, 2), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define JNE64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 5, 0), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 2, 0), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define JA32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (value), 0, 1), \
> +	jt
> +
> +#define JA64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (hi), 3, 0), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (lo), 0, 2), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define JGE32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 0, 1), \
> +	jt
> +
> +#define JLT32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 1, 0), \
> +	jt
> +
> +/* Shortcut checking if hi>  arg.hi. */
> +#define JGE64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (lo), 0, 2), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define JLT64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (hi), 0, 4), \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define JGT32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \
> +	jt
> +
> +#define JLE32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \
> +	jt

Should the true/false offsets be reversed here?

Thanks for all the work on this.  We're looking forward to using it with 
QEMU.

> +
> +/* Check hi>  args.hi first, then do the GE checking */
> +#define JGT64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 0, 2), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define JLE64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 6, 0), \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 3), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define LOAD_SYSCALL_NR \
> +	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \
> +		 offsetof(struct seccomp_filter_data, syscall_nr))
> +
> +#endif  /* __BPF_HELPER_H__ */


-- 
Regards,
Corey

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/