linux-kernel - Re: [PATCH v3 02/13] tracing: split out syscall_trace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTimWvddKkVQA1qOc6KhxPUj8j2dWQg@mail.gmail.com>
Date:	Wed, 1 Jun 2011 12:15:17 -0500
From:	Will Drewry <wad@...omium.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	linux-kernel@...r.kernel.org, kees.cook@...onical.com,
	torvalds@...ux-foundation.org, tglx@...utronix.de,
	rostedt@...dmis.org, jmorris@...ei.org,
	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...hat.com>
Subject: Re: [PATCH v3 02/13] tracing: split out syscall_trace_enter construction

On Wed, Jun 1, 2011 at 2:00 AM, Ingo Molnar <mingo@...e.hu> wrote:
>
> * Will Drewry <wad@...omium.org> wrote:
>
>> perf appears to be the primary consumer of the CONFIG_FTRACE_SYSCALLS
>> infrastructure.  As such, many the helpers target at perf can be split
>> into a peerf-focused helper and a generic CONFIG_FTRACE_SYSCALLS
>> consumer interface.
>>
>> This change splits out syscall_trace_enter construction from
>> perf_syscall_enter for current into two helpers:
>> - ftrace_syscall_enter_state
>> - ftrace_syscall_enter_state_size
>>
>> And adds another helper for completeness:
>> - ftrace_syscall_exit_state_size
>>
>> These helpers allow for shared code between perf ftrace events and
>> any other consumers of CONFIG_FTRACE_SYSCALLS events.  The proposed
>> seccomp_filter patches use this code.
>>
>> Signed-off-by: Will Drewry <wad@...omium.org>
>> ---
>>  include/trace/syscall.h       |    4 ++
>>  kernel/trace/trace_syscalls.c |   96 +++++++++++++++++++++++++++++++++++------
>>  2 files changed, 86 insertions(+), 14 deletions(-)
>
> So, looking at the diffstat comparison again:
>
>       bitmask (2009):  6 files changed,  194 insertions(+), 22 deletions(-)
>  filter engine (2010): 18 files changed, 1100 insertions(+), 21 deletions(-)
>  event filters (2011):  5 files changed,   82 insertions(+), 16 deletions(-)
>
> you went back to the middle solution again which is the worst of them
> - why?

In short, design for the future and implement now.  I'll elaborate a
bit more below.

> If you want this to be a stupid, limited hack then go for the v1
> bitmask.

I only aim for the finest!

(bitmasks were bad for the other consumers of this patch series:
socketcall mulitplexing issues and ioctl # filtering).

> If you agree with my observation that filters allow the clean
> user-space implementation of LSM equivalent security solutions (of
> which sandboxes are just a *narrow special case*) then please use the
> main highlevel abstraction we have defined around them: event
> filters.

I agree that LSM-equivalent security solutions can be moved over to an
ftrace based infrastructure.  However, LSMs and seccomp have different
semantics.  Reducing the kernel attack surface in a
"sandboxing"-sort-of-way requires a default-deny interface that is
resilient to kernel changes (like new system calls) without
immediately degrading robustness.  LSMs provide a fail-open mechanism
for taking an active role in kernel-defined pinch points.  It is
possible to implement a default-deny LSM, but it requires a "hook" for
every security event and the addition of a security event results in a
hole in the not-so-default-deny infrastructure.  ftrace + event
filters are the same.

Based on my observations while exploring the code, it appears that the
LSM security_* calls could easily become active trace events and the
LSM infrastructure moved over to use those as tracepoints or via
event_filters.  There will be a need for new predicates for the
various new types (inode *, etc), and so on.  However, the
trace_sys_enter/__secure_computing model will still be a special case.
 Even if they fed into security event subsystem or something like
that, the absence of filters on a traced process would need to
default-deny as well as when there are no active matches.  So while a
brand-new shared ABI may be possible (security_event_open,
active_event_open, ?), there will still be trickiness in making the
behaviors not have implicit side effects and ensure that newly added
system calls, for instance, that lack the macro wrapper don't poke a
hole in the "sandbox" model.  There are a lot of options for designing
it though.  Like making TIF_SECCOMP mean that any security_* filter
failure or match count of 0 == process death.  It's just that
designing this new approach will be incredibly hairy, and we really
lack many of the concrete requirements that would be needed, in my
opinion.

> Now, my observation was not uncontested so let me try to sum up the
> rather large discussion that erupted around it, as i see it.
>
> I saw four main counter arguments:
>
>  - "Sandboxing is special and should stay separate from LSMs."
>
>   I think this is a technically bogus argument, see:
>
>         https://lkml.org/lkml/2011/5/26/85
>
>   That answer of mine went unchallenged.

I may have spoken to this above.  I dunno.

>  - "Events should only be observers."
>
>   Even ignoring the question of why on earth it should be a problem
>   for a willing call-site to use event filtering results sensibly,
>   this argument misses the plain fact that events are *already*
>   active participants, see:
>
>         http://www.spinics.net/lists/mips/msg41075.html
>
>   That answer of mine went unchallenged too.
>
>  - "This feature is too simplistic."
>
>   That's wrong i think, the feature is highly flexible:
>
>         http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg51387.html
>
>   This reply of mine went unchallenged as well.

Well I did only implement a PoC.  It couldn't handle attack surface
reduction after-the-fact, nor did I add a GET_FILTER call, etc.  The
code was minimal in many ways because the functionality was too.

>  - "Is this feature actually useful enough for applications, does it
>    justify the complexity?"
>
>  This is the *only* valid technical counter-argument i saw, and it's
>  a crutial one that is not fully answered yet. Since i think the feature
>  is an LSM equivalent i think it's at least as useful as any LSM is.
>
>  - [ if i missed any important argument then someone please insert it
>     here. ]
>
> But what you do here is to use the filter engine directly which is
> both a limited hack *and* complex (beyond the linecount it doubles
> our ABI exposure, amongst other things), so i find that approach
> rather counter-productive, now that i've seen the real thing.
>
> Will this feature be just another example of the LSM status quo
> dragging down a newcomer into the mud, until it's just as sucky and
> limited as any existing LSMs? That would be a sad outcome!

I hope not.  I believe it will be easy to move the backend of
seccomp_filter over to a per-task ftrace event filter infrastructure
when that comes in the future.  But for now, I'm trying to meet the
needs of possible consumers now: chromium, qemu, lxc, and lay
groundwork for a ftrace-future.

If this is a total fail, then perhaps we should have a separate
discussion over how we can tackle a lot of these needs.  I was hoping
that we could push some of that off to the LinuxSecuritySummit -- I've
proposed/requested a QA panel on this topic :)  But I'd love to not
wait until then for everything.

> ps. Please start a new discussion thread for the next iteration!
>    This one is *way* too deep already.

Sorry - will do!

thanks!
will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/