[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202009241647.2239747F0@keescook>
Date: Thu, 24 Sep 2020 16:56:52 -0700
From: Kees Cook <keescook@...omium.org>
To: YiFei Zhu <zhuyifei1999@...il.com>
Cc: containers@...ts.linux-foundation.org,
YiFei Zhu <yifeifz2@...inois.edu>, bpf@...r.kernel.org,
linux-kernel@...r.kernel.org, Aleksa Sarai <cyphar@...har.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Andy Lutomirski <luto@...capital.net>,
Dimitrios Skarlatos <dskarlat@...cmu.edu>,
Giuseppe Scrivano <gscrivan@...hat.com>,
Hubertus Franke <frankeh@...ibm.com>,
Jack Chen <jianyan2@...inois.edu>,
Jann Horn <jannh@...gle.com>,
Josep Torrellas <torrella@...inois.edu>,
Tianyin Xu <tyxu@...inois.edu>,
Tobin Feldman-Fitzthum <tobin@....com>,
Tycho Andersen <tycho@...ho.pizza>,
Valentin Rothberg <vrothber@...hat.com>,
Will Drewry <wad@...omium.org>
Subject: Re: [PATCH v2 seccomp 6/6] seccomp/cache: Report cache data through
/proc/pid/seccomp_cache
On Thu, Sep 24, 2020 at 07:44:21AM -0500, YiFei Zhu wrote:
> From: YiFei Zhu <yifeifz2@...inois.edu>
>
> Currently the kernel does not provide an infrastructure to translate
> architecture numbers to a human-readable name. Translating syscall
> numbers to syscall names is possible through FTRACE_SYSCALL
> infrastructure but it does not provide support for compat syscalls.
>
> This will create a file for each PID as /proc/pid/seccomp_cache.
> The file will be empty when no seccomp filters are loaded, or be
> in the format of:
> <hex arch number> <decimal syscall number> <ALLOW | FILTER>
> where ALLOW means the cache is guaranteed to allow the syscall,
> and filter means the cache will pass the syscall to the BPF filter.
>
> For the docker default profile on x86_64 it looks like:
> c000003e 0 ALLOW
> c000003e 1 ALLOW
> c000003e 2 ALLOW
> c000003e 3 ALLOW
> [...]
> c000003e 132 ALLOW
> c000003e 133 ALLOW
> c000003e 134 FILTER
> c000003e 135 FILTER
> c000003e 136 FILTER
> c000003e 137 ALLOW
> c000003e 138 ALLOW
> c000003e 139 FILTER
> c000003e 140 ALLOW
> c000003e 141 ALLOW
> [...]
>
> This file is guarded by CONFIG_PROC_SECCOMP_CACHE with a default
> of N because I think certain users of seecomp might not want the
> application to know which syscalls are definitely usable.
>
> I'm not sure if adding all the "human readable names" is worthwhile,
> considering it can be easily done in userspace.
The question of permissions is my central concern here: who should see
this? Some contained processes have been intentionally blocked from
self-introspection so even the "standard" high bar of "ptrace attach
allowed?" can't always be sufficient.
My compromise about filter visibility in the past was saying that
CAP_SYS_ADMIN was required (see seccomp_get_filter()). I'm nervous to
weaken this. (There is some work that hasn't been sent upstream yet that
is looking to expose the filter _contents_ via /proc that has been
nervous too.)
Now full contents vs "allow"/"filter" are certainly different things,
but I don't feel like I've got enough evidence to show that this
introspection would help debugging enough to justify the partially
imagined safety of not exposing it to potential attackers.
I suspect it _is_ the right thing to do (just look at my own RFC's
"debug" patch), but I'd like this to be well justified in the commit
log.
And yes, while it does hide behind a CONFIG, I'd still want it justified,
especially since distros have a tendency to just turn everything on
anyway. ;)
> + for (arch = 0; arch < ARRAY_SIZE(syscall_arches); arch++) {
> + for (nr = 0; nr < NR_syscalls; nr++) {
> + bool cached = test_bit(nr, f->cache.syscall_ok[arch]);
> + char *status = cached ? "ALLOW" : "FILTER";
> +
> + seq_printf(m, "%08x %d %s\n", syscall_arches[arch],
> + nr, status
> + );
> + }
> + }
But behavior-wise, yeah, I like it; I'm fine with human-readable and
full AUDIT_ARCH values. (Though, as devil's advocate again, to repeat
Jann's own words back: do we want to add this only to have a new UAPI to
support going forward?)
--
Kees Cook
Powered by blists - more mailing lists