[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUSgmUMfX9L_d6MmXS=VONxDp5F1n-B5dkQw5yaLs-L-g@mail.gmail.com>
Date: Tue, 8 Sep 2015 15:55:07 -0700
From: Andy Lutomirski <luto@...capital.net>
To: "Eric W. Biederman" <ebiederm@...ssion.com>,
David Drysdale <drysdale@...gle.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Serge E. Hallyn" <serge@...lyn.com>
Subject: Re: RFC: fsyscall
On Tue, Sep 8, 2015 at 3:35 PM, Eric W. Biederman <ebiederm@...ssion.com> wrote:
>
> I was thinking a bit about the problem of allowing another process to
> perform a subset of what your process can perform, and it occured to me
> there might be something conceptually simple we can do.
>
> Have a system call fsyscall that takes a file descriptor the system call
> number and the parameters to that system call as arguments. AKA
> long fsyscall(int fd, long number, ...); AKA syscall with a file
> desciptor argument.
>
> The fd would hold a struct cred, and a filter that limits what system
> calls and which parameters may be passed.
>
> The implementation of fsyscall would be something like:
> old = override_creds(f->f_cred);
> /* Perform filtered syscallf */
> revert_creds(old);
>
> Then we have another system call call it fsyscall_create(...) that takes
> a bpf filter and returns a file descriptor, that can be used with
> fsyscall.
>
> I'm not certain that bpf is the best way to create such a filter but it
> seems plausible, and we already have the infrastructure in place, so if
> nothing else there would be synergy in syscall filtering.
>
> My two concerns with bpf are (a) it seems a little complex for the
> simplest use cases. (b) I think there cases like inspecting the data
> passed into write, or send, or the structure passed into ioctl that it
> doesn't handle well yet.
>
> Andy does a fsyscall system call sound like something that would be not
> be too bad to implement? (You have just been through all of the x86
> system call paths recently).
It's not possible yet due to nasty calling convention issues.
(Entries in the x86 syscall table aren't actually functions callable
using the C ABI right now.) My pending monster patchset will make it
possible to implement for 32-bit syscalls (native and compat). I'm
planning on addressing 64-bit, and I want to do almost the reverse of
what you're proposing: have a way that one task can trap into a
special mode in which another process can do syscalls on its behalf.
There are some syscalls for which this simply makes no sense.
Setresuid, capset, and similar come to mind. Clone and friends may
screw up impressively if you try this. fsyscall should not be allowed
to call itself. If you call write(2) like this and it has any
meaningful effect, something's wrong. keyctl(2) does really awful
things wrt struct cred, and I don't really want to think about what
happens if you try calling it like this.
override_creds is IMO awful. Serge and I had an old discussion on how
to maybe fix it.
Honestly, I think the way to go might be to get Capsicum, or at least
Capsicum's fd model, merged and to add a mode in which the *at
operations on a specially marked fd use the passed fd's f_cred instead
of the caller's. (Cc: David Drysdale -- that feature might be really
nice.)
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists