linux-kernel - Re: Using ftrace/perf as a basis for generic seccomp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201102032306.34251.sf@sfritsch.de>
Date:	Thu, 3 Feb 2011 23:06:33 +0100
From:	Stefan Fritsch <sf@...itsch.de>
To:	Frederic Weisbecker <fweisbec@...il.com>,
	Eric Paris <eparis@...hat.com>, Ingo Molnar <mingo@...e.hu>
Cc:	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	Eric Paris <eparis@...isplace.org>,
	linux-kernel@...r.kernel.org, agl@...gle.com, tzanussi@...il.com,
	Jason Baron <jbaron@...hat.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	2nddept-manager@....hitachi.co.jp,
	Steven Rostedt <rostedt@...dmis.org>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: Using ftrace/perf as a basis for generic seccomp

Hi,

On Thursday 03 February 2011, Frederic Weisbecker wrote:
> I think you won't work with trace events, so you need to make the
> filtering code more tracing-agnostic.
> 
> But I think it's quite workable and shouldn't be too hard to split
> that into a filtering backend. Many parts are already pretty
> standalone.
> 
> Also I suspect the tracepoints are not what you need. Or may be
> they are. But as Masami said, the syscall tracepoint is called
> late. It's workable though. The other problem is that preemption
> is disabled when tracepoints are called, which is probably not
> what you want. One day I think we'll need to unify the tracepoints
> and notifier code but until then, better keep tracepoints for
> tracing.
> 
> Now once you have the filtering code more generic, you still
> need an arch backend to map register contents and layout into
> syscall arguments name and type. On top of which you can finally
> use the filtering code. For that you can use, again, some code we
> use for tracing, which are syscalls metadata: informations
> generated on build time that have syscalls fields and type.
> And that also needs to be split up, but it's more trivial
> than the filtering part.

AFAICS the infrastructure for tracing and metadata of compat syscalls 
is also still missing. That would need to be added, too. Jason Baron 
and Ian Munsie have worked on this in mid 2010, but I don't know about 
the current status.

Considering that all this is still quite a bit of work and that the 
initial suggestion by Adam Langley happened nearly two years ago, 
maybe a two step approach would be better:

Integrate a seccomp mode 2 now, which only supports a bitmask of 
bitmaps and no filtering.

Then, when the infrastructure for the filtering is finished, add a 
seccomp mode 3 with support for filtering.

This would give something in the very near future that is way more 
usable than seccomp mode 1. I think only the following adjustments 
would need to be made to Adam Langley's patch:

- only allow syscalls in the mode (non-compat/compat) that the prctl 
call was made in
- deny exec of setuid/setgid binaries
- deny exec of binaries with filesystem capabilities

What do you think of this proposal? I have a patch lying around 
somewhere that already does the first two of these.

BTW, given that the compat syscall layer tends to have security bugs, 
being able to disable compat syscalls per process would already have 
some value on its own. Is there a way to do this by disabling the int 
80 and compat sysenter()/syscall() vectors per process? In my patch I 
did a check in secure_computing(), which is of course less elegant.

> Note for now, filtering + syscalls metadata only works on top
> of raw arguments value. Syscalls metadata don't know much
> about type semantics and won't help you to dereference
> syscall argument pointers. Only raw syscall parameter values.
> Similarly, the filtering code can't evaluate pointer dereferencing
> expression evaluation, only direct values comprehension.

Pointer dereferencing at syscall entry must be avoided for seccomp 
anyway, or there would be race conditions. Of course if the filtering 
points could be put after the final copy_form_user, it would be ok.

Cheers,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/