linux-kernel - Re: [PATCH linux-next] mqueue: fix IPC namespace use-after-free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87609247f5.fsf@xmission.com>
Date:   Tue, 19 Dec 2017 17:36:14 -0600
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Al Viro <viro@...IV.linux.org.uk>
Cc:     Giuseppe Scrivano <gscrivan@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, alexander.deucher@....com,
        broonie@...nel.org, chris@...is-wilson.co.uk,
        David Miller <davem@...emloft.net>, deepa.kernel@...il.com,
        Greg KH <gregkh@...uxfoundation.org>,
        luc.vanoostenryck@...il.com, lucien xin <lucien.xin@...il.com>,
        Ingo Molnar <mingo@...nel.org>,
        Neil Horman <nhorman@...driver.com>,
        syzkaller-bugs@...glegroups.com,
        Vladislav Yasevich <vyasevich@...il.com>
Subject: Re: [PATCH linux-next] mqueue: fix IPC namespace use-after-free

Al Viro <viro@...IV.linux.org.uk> writes:

> On Tue, Dec 19, 2017 at 03:49:24PM -0600, Eric W. Biederman wrote:
>> > what would you be delaying?  kmem_cache_alloc() for struct mount and assignments
>> > to its fields?  That's noise; if anything, I would expect the main cost with
>> > a plenty of containers to be in sget() scanning the list of mqueue superblocks.
>> > And we can get rid of that, while we are at it - to hell with mount_ns(), with
>> > that approach we can just use mount_nodev() instead.  The logics in
>> > mq_internal_mount() will deal with multiple instances - if somebody has already
>> > triggered creation of internal mount, all subsequent calls in that ipcns will
>> > end up avoiding kern_mount_data() entirely.  And if you have two callers
>> > racing - sure, you will get two superblocks.  Not for long, though - the first
>> > one to get to setting ->mq_mnt (serialized on mq_lock) wins, the second loses
>> > and prompty destroys his vfsmount and superblock.  I seriously suspect that
>> > variant below would cut down on the cost a whole lot more - as it is, we have
>> > the total of O(N^2) spent in the loop inside of sget_userns() when we create
>> > N ipcns and mount in each of those; this patch should cut that to
>> > O(N)...
>> 
>> If that is where the cost is, is there any point in delaying creating
>> the internal mount at all?
>
> We won't know without the profiles...  Incidentally, is there any point in
> using mount_ns() for procfs?  Similar scheme (with ->proc_mnt instead of
> ->mq_mnt, of course) would live with mount_nodev() just fine, and it's
> definitely less costly - we don't bother with the loop in sget_userns()
> at all that way.

The mechanism of mqueuefs and proc are different for dealing with a
filesystem that continues to be mounted/referenced after the namespace
exists.

Proc actually takes a reference on the pid namespace so it is easier to
work with.  pid_ns_prepare_proc and and pid_ns_release_proc are the
namespace side of that dependency.

So yes we could look at a local cache in the namespace and all
would be well for proc.  I don't know what we would use for locking when
we start allowing more that one path to set it.
atmoic_cmpxchg(&proc_mnt, NULL)?

That makes me suspect we could have a common helper that does the work.

I do know that the reason I moved proc to mount_ns is that it had simply
been open coding that function.

Eric