linux-kernel - Re: [TEST PATCH] pid: fix allocating pid 2 for init (was Re: proc_flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Sat, 23 Dec 2017 21:12:14 -0600
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Alexey Dobriyan <adobriyan@...il.com>
Cc:     Dave Jones <davej@...emonkey.org.uk>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Linux Kernel <linux-kernel@...r.kernel.org>,
        syzkaller-bugs@...glegroups.com, Gargi Sharma <gs051095@...il.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Rik van Riel <riel@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [TEST PATCH] pid: fix allocating pid 2 for init (was Re: proc_flush_task oops)

Alexey Dobriyan <adobriyan@...il.com> writes:

> On Fri, Dec 22, 2017 at 08:41:54AM -0600, Eric W. Biederman wrote:
>> Alexey Dobriyan <adobriyan@...il.com> writes:
>
>> > unshare
>> > fork
>> >     alloc_pid in level 1 succeeds
>> >     alloc_pid in level 0 fails, ->idr_next is 2
>> > fork
>> >     alloc pid 2
>> > exit
>> >
>> > Reliable reproducer and fail injection patch attached
>> >
>> > I'd say proper fix is allocating pids in the opposite order
>> > so that failure in the last layer doesn't move IDR cursor
>> > in baby child pidns.
>> 
>> I agree with you about changing the order.  That will make
>> the code simpler and in the second loop actually conforming C code,
>> and fix the immediate problem.
>
> Something like that (barely tested)

I have thought about this some more and I think we can do better.

I don't like the on stack pid_ns array.

The only reason the code calls disable_pid_allocation is that we
don't handle this error.

The semantics of least surprise, are that if we run out of resources
while allocating something, trying again when more resources are
available will make it work.

So it looks like handling the error will improve the quality of the
implemenation, and be a simpler, less dangerous patch.

diff --git a/kernel/pid.c b/kernel/pid.c
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -226,6 +224,10 @@ struct pid *alloc_pid(struct pid_namespace *ns)
        while (++i <= ns->level)
                idr_remove(&ns->idr, (pid->numbers + i)->nr);
 
+       /* On failure to allocate the first pid, reset the state */
+       if (ns->pid_allocated == PIDNS_ADDING)
+               idr_set_cursor(&ns->idr, 0);
+
        spin_unlock_irq(&pidmap_lock);
 
        kmem_cache_free(ns->pid_cachep, pid);

Eric