linux-kernel - Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <686E805C-81F3-43D0-A096-50C644C57EE3@amacapital.net>
Date:   Tue, 10 Jul 2018 16:59:34 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     David Howells <dhowells@...hat.com>
Cc:     viro@...iv.linux.org.uk, linux-api@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org, jannh@...gle.com
Subject: Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]

[cc Jann - you love this stuff]

> On Jul 10, 2018, at 3:44 PM, David Howells <dhowells@...hat.com> wrote:
> 
> Provide an fsopen() system call that starts the process of preparing to
> create a superblock that will then be mountable, using an fd as a context
> handle.  fsopen() is given the name of the filesystem that will be used:
> 
>    int mfd = fsopen(const char *fsname, unsigned int flags);

This is great in principle, but I think you’re seriously playing with fire with the API. 

> 
> where flags can be 0 or FSOPEN_CLOEXEC.
> 
> For example:
> 
>    sfd = fsopen("ext4", FSOPEN_CLOEXEC);
>    write(sfd, "s /dev/sdb1"); // note I'm ignoring write's length arg

Imagine some malicious program passes sfd as stdout to a setuid program. That program gets persuaded to write “s /etc/shadow”.  What happens?  You’re okay as long as *every single fs* gets it right, but that’s asking a lot.

>    write(sfd, "o noatime");
>    write(sfd, "o acl");
>    write(sfd, "o user_attr");
>    write(sfd, "o iversion");
>    write(sfd, "o ");
>    write(sfd, "r /my/container"); // root inside the fs
>    write(sfd, "x create"); // create the superblock

From cursory inspection of a bunch of the code, I think the expectation is that the actual device access happens in the “x” action. This is not okay. You can’t do this kind of thing in a write() handler, unless you somehow make every single access using f_cred, which is a real pain.

I think the right solution is one of:

(a) Pass a netlink-formatted blob to fsopen() and do the whole thing in one syscall. I don’t mean using netlink sockets — just the nlattr format.  Or you could use a different format. The part that matters is using just one syscall to do the whole thing.

(b) Keep the current structure but use a new syscall instead of write().

(c) Keep using write() but literally just buffer the data. Then have a new syscall to commit it.  In other words, replace “x” with a syscall and call all the fs_context_operations helpers in that context instead of from write().