[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez2-p=jGZwmb3soh87RM6qHGEi6ctbLOVZ99LG6aHXUX_g@mail.gmail.com>
Date: Tue, 10 Jul 2018 18:14:10 -0700
From: Jann Horn <jannh@...gle.com>
To: Andy Lutomirski <luto@...capital.net>
Cc: David Howells <dhowells@...hat.com>,
Al Viro <viro@...iv.linux.org.uk>,
Linux API <linux-api@...r.kernel.org>,
linux-fsdevel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for
superblock creation [ver #9]
On Tue, Jul 10, 2018 at 4:59 PM Andy Lutomirski <luto@...capital.net> wrote:
>
> [cc Jann - you love this stuff]
>
> > On Jul 10, 2018, at 3:44 PM, David Howells <dhowells@...hat.com> wrote:
> >
> > Provide an fsopen() system call that starts the process of preparing to
> > create a superblock that will then be mountable, using an fd as a context
> > handle. fsopen() is given the name of the filesystem that will be used:
> >
> > int mfd = fsopen(const char *fsname, unsigned int flags);
>
> This is great in principle, but I think you’re seriously playing with fire with the API.
>
> >
> > where flags can be 0 or FSOPEN_CLOEXEC.
> >
> > For example:
> >
> > sfd = fsopen("ext4", FSOPEN_CLOEXEC);
> > write(sfd, "s /dev/sdb1"); // note I'm ignoring write's length arg
>
> Imagine some malicious program passes sfd as stdout to a setuid program. That program gets persuaded to write “s /etc/shadow”. What happens? You’re okay as long as *every single fs* gets it right, but that’s asking a lot.
>
> > write(sfd, "o noatime");
> > write(sfd, "o acl");
> > write(sfd, "o user_attr");
> > write(sfd, "o iversion");
> > write(sfd, "o ");
> > write(sfd, "r /my/container"); // root inside the fs
> > write(sfd, "x create"); // create the superblock
>
> From cursory inspection of a bunch of the code, I think the expectation is that the actual device access happens in the “x” action. This is not okay. You can’t do this kind of thing in a write() handler, unless you somehow make every single access using f_cred, which is a real pain.
>
> I think the right solution is one of:
>
> (a) Pass a netlink-formatted blob to fsopen() and do the whole thing in one syscall. I don’t mean using netlink sockets — just the nlattr format. Or you could use a different format. The part that matters is using just one syscall to do the whole thing.
>
> (b) Keep the current structure but use a new syscall instead of write().
>
> (c) Keep using write() but literally just buffer the data. Then have a new syscall to commit it. In other words, replace “x” with a syscall and call all the fs_context_operations helpers in that context instead of from write().
I also love ioctls, so I think you could also use an ioctl to do the
commit? You can do anything (well, almost anything) that you can do in
syscall context in ioctl context, too; and when you already have a
file descriptor of a specific type that you want to perform an
operation on, an ioctl works just fine.
Powered by blists - more mailing lists