linux-kernel - Re: Threads stuck in zap_pid_ns

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e7826335-3cb5-a4bf-f382-252f8f0a94a4@roeck-us.net>
Date:   Sat, 13 May 2017 07:34:13 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     Vovo Yang <vovoy@...gle.com>, Ingo Molnar <mingo@...nel.org>,
        linux-kernel@...r.kernel.org
Subject: Re: Threads stuck in zap_pid_ns_processes()

On 05/12/2017 01:03 PM, Eric W. Biederman wrote:
> Guenter Roeck <linux@...ck-us.net> writes:
>
>> Hi Eric,
>>
>> On Fri, May 12, 2017 at 12:33:01PM -0500, Eric W. Biederman wrote:
>>> Guenter Roeck <linux@...ck-us.net> writes:
>>>
>>>> Hi Eric,
>>>>
>>>> On Fri, May 12, 2017 at 08:26:27AM -0500, Eric W. Biederman wrote:
>>>>> Vovo Yang <vovoy@...gle.com> writes:
>>>>>
>>>>>> On Fri, May 12, 2017 at 7:19 AM, Eric W. Biederman
>>>>>> <ebiederm@...ssion.com> wrote:
>>>>>>> Guenter Roeck <linux@...ck-us.net> writes:
>>>>>>>
>>>>>>>> What I know so far is
>>>>>>>> - We see this condition on a regular basis in the field. Regular is
>>>>>>>>   relative, of course - let's say maybe 1 in a Milion Chromebooks
>>>>>>>>   per day reports a crash because of it. That is not that many,
>>>>>>>>   but it adds up.
>>>>>>>> - We are able to reproduce the problem with a performance benchmark
>>>>>>>>   which opens 100 chrome tabs. While that is a lot, it should not
>>>>>>>>   result in a kernel hang/crash.
>>>>>>>> - Vovo proviced the test code last night. I don't know if this is
>>>>>>>>   exactly what is observed in the benchmark, or how it relates to the
>>>>>>>>   benchmark in the first place, but it is the first time we are actually
>>>>>>>>   able to reliably create a condition where the problem is seen.
>>>>>>>
>>>>>>> Thank you.  I will be interesting to hear what is happening in the
>>>>>>> chrome perfomance benchmark that triggers this.
>>>>>>>
>>>>>> What's happening in the benchmark:
>>>>>> 1. A chrome renderer process was created with CLONE_NEWPID
>>>>>> 2. The process crashed
>>>>>> 3. Chrome breakpad service calls ptrace(PTRACE_ATTACH, ..) to attach to every
>>>>>>   threads of the crashed process to dump info
>>>>>> 4. When breakpad detach the crashed process, the crashed process stuck in
>>>>>>   zap_pid_ns_processes()
>>>>>
>>>>> Very interesting thank you.
>>>>>
>>>>> So the question is specifically which interaction is causing this.
>>>>>
>>>>> In the test case provided it was a sibling task in the pid namespace
>>>>> dying and not being reaped.  Which may be what is happening with
>>>>> breakpad.  So far I have yet to see kernel bug but I won't rule one out.
>>>>>
>>>>
>>>> I am trying to understand what you are looking for. I would have thought
>>>> that both the test application as well as the Chrome functionality
>>>> described above show that there are situations where zap_pid_ns_processes()
>>>> can get stuck and cause hung task timeouts in conjunction with the use of
>>>> ptrace().
>>>>
>>>> Your last sentence seems to suggest that you believe that the kernel might
>>>> do what it is expected to do. Assuming this is the case, what else would
>>>> you like to see ? A test application which matches exactly the Chrome use
>>>> case ? We can try to provide that, but I don't entirely understand how
>>>> that would change the situation. After all, we already know that it is
>>>> possible to get a thread into this condition, and we already have one
>>>> means to reproduce it.
>>>>
>>>> Replacing TASK_UNINTERRUPTIBLE with TASK_INTERRUPTABLE works for both the
>>>> test application and the Chrome benchmark. The thread is still stuck in
>>>> zap_pid_ns_processes(), but it is now in S (sleep) state instead of D,
>>>> and no longer results in a hung task timeout. It remains in that state
>>>> until the parent process terminates. I am not entirely happy with it
>>>> since the processes are still stuck and may pile up over time, but at
>>>> least it solves the immediate problem for us.
>>>>
>>>> Question now is what to do with that solution. We can of course apply
>>>> it locally to Chrome OS, but I would rather have it upstream - especially
>>>> since we have to assume that any users of Chrome on Linux, or more
>>>> generically anyone using ptrace in conjunction with CLONE_NEWPID, may
>>>> experience the same problem. Right now I have no idea how to get there,
>>>> though. Can you provide some guidance ?
>>>
>>> Apologies for not being clear.  I intend to send a pull request with the
>>
>> No worries.
>>
>>> the TASK_UINTERRUPTIBLE to TASK_INTERRUPTIBLE change to Linus in the
>>> next week or so with a Cc stable and an appropriate Fixes tag.  So the
>>> fix can be backported.
>>>
>> Great, that is good to know.
>>
>>> I have a more comprehensive change queued I will probably merge for 4.13
>>> already but it just changes what kind of zombies you see.  It won't
>>> remove the ``stuck'' zombies.
>>>
>>> So what I am looking for now is:
>>> Why are things getting stuck in your benchmark?
>>>
>>> -  Is it a userspace bug?
>>>
>>>   In which case we can figure out what userspace (aka breakpad) needs
>>>    to do to avoid the problem.
>>>
>> I spent some time trying to understand what breakpad is doing. I don't
>> really see how it can be blamed. It attaches itself to a crashing thread,
>> gets the information it is looking for, and detaches itself. At the
>> same time, whatever it does, it seems to me that userspace should not
>> be able to create such a situation in the first place.
>
> I suspect what is happening is that the thread breakpad is attached to
> is killed by zap_pid_ns_processes.  At which point PTRACE_DETACH won't
> work because the thread has been killed and waitpid(thread_pid,...)
> needs to be used instead.
>

I think you nailed it. If I drop CLONE_NEWPID from the reproducer I get
a zombie process.

I guess the only question left is if zap_pid_ns_processes() should (or could)
somehow detect that situation and return instead of waiting forever.
What do you think ?

Thanks,
Guenter