linux-kernel - Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <338BC3C4-F3E7-48F0-A82E-2C7295B6640E@amacapital.net>
Date:   Thu, 12 Jul 2018 09:23:22 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     David Howells <dhowells@...hat.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Linux API <linux-api@...r.kernel.org>,
        Linux FS Devel <linux-fsdevel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Jann Horn <jannh@...gle.com>, tycho@...ho.ws
Subject: Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]



> On Jul 12, 2018, at 7:54 AM, David Howells <dhowells@...hat.com> wrote:
> 
> Andy Lutomirski <luto@...nel.org> wrote:
> 
>>> On Jul 11, 2018, at 12:22 AM, David Howells <dhowells@...hat.com> wrote:
>>> 
>>> Andy Lutomirski <luto@...capital.net> wrote:
>>> 
>>>>>  sfd = fsopen("ext4", FSOPEN_CLOEXEC);
>>>>>  write(sfd, "s /dev/sdb1"); // note I'm ignoring write's length arg
>>>> 
>>>> Imagine some malicious program passes sfd as stdout to a setuid
>>>> program. That program gets persuaded to write "s /etc/shadow".  What
>>>> happens?  You’re okay as long as *every single fs* gets it right, but
>>>> that’s asking a lot.
>>> 
>>> Do note that you must already have CAP_SYS_ADMIN to be able to call
>>> fsopen().
>> 
>> If you're not allowing it already, someone will want user namespace
>> root to be able to use this very, very soon.
> 
> Yeah, I'm sure.  And I've been thinking on how to deal with it.
> 
> I think we *have* to open the source files/devices with the creds of whoever
> called fsopen() or fspick() - that way you can't upgrade your privs by passing
> your context fd to a suid program.  To enforce this, I think it's simplest for
> fscontext_write() to call override_creds() right after taking the uapi_mutex
> and then call revert_creds() right before dropping the mutex.
> 

If you make a syscall that attaches a block device to an fscontext, you don’t need any of this.  Heck, someone might actually *want* to grab a block device from a different namespace.

All this override_creds() stuff is maybe okay if we were fixing an old broken thing. But this is brand new.  And having write() call override_creds() and do nontrivial things is a fascinating attack surface.

Just imagine what blows up if I abuse fscontext to open a block device on a path that traverses an AFS mount or /proc/.../fd or similar.  Or if I splice() from a network filesystem into fscontext.

(Al- can’t we just stop allowing splice() at all on things that don’t use iov_iter?)

> Another thing we might want to look at is to allow a supervisory process to
> examine the context before permitting the create/reconfigure action to
> proceed.  It might also be possible to do this through the LSM.

Cc Tycho. He’s working on this exact idea using seccomp. And he’d probably much, much prefer if configuration of an fscontext didn’t use a performance-critical syscall like write().

As a straw man, I suggest:

fsconfigure(contextfd, ADD_BLOCKDEV, dfd, path, flags);

fsconfigure(contextfd, ADD_OPTION, 0, “foo=bar”, flags);

Etc.