lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87tw3z8pq1.fsf@xmission.com>
Date:   Thu, 01 Jun 2017 14:36:38 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Guenter Roeck <linux@...ck-us.net>
Cc:     Vovo Yang <vovoy@...gle.com>, Ingo Molnar <mingo@...nel.org>,
        linux-kernel@...r.kernel.org
Subject: Re: Threads stuck in zap_pid_ns_processes()

Guenter Roeck <linux@...ck-us.net> writes:

> On Thu, Jun 01, 2017 at 12:08:58PM -0500, Eric W. Biederman wrote:
>> Guenter Roeck <linux@...ck-us.net> writes:
>> >
>> > I think you nailed it. If I drop CLONE_NEWPID from the reproducer I get
>> > a zombie process.
>> >
>> > I guess the only question left is if zap_pid_ns_processes() should (or could)
>> > somehow detect that situation and return instead of waiting forever.
>> > What do you think ?
>> 
>> Any chance you can point me at the chromium code that is performing the
>> ptrace?
>> 
>> I want to conduct a review of the kernel semantics to see if the current
>> semantics make it unnecessarily easy to get into hang situations.  If
>> the semantics make it really easy to get into a hang situation I want
>> to see if there is anything we can do to delicately change the semantics
>> to avoid the hangs without breaking existing userspace.
>> 
> The internal bug should be accessible to you.
>
> https://bugs.chromium.org/p/chromium/issues/detail?id=721298&desc=2
>
> It has some additional information, and points to the following code in Chrome.
>
> https://cs.chromium.org/chromium/src/breakpad/src/client/linux/minidump_writer/linux_ptrace_dumper.cc?rcl=47e51739fd00badbceba5bc26b8abc8bbd530989&l=85
>
> With the information we have, I don't really have a good idea what we could or
> should change in Chrome to make the problem disappear, so I just concluded that
> we'll have to live with the forever-sleeping task.

I believe I see what is happening.  The code makes the assumption that a
thread will stay stopped and will not go away once ptrace attach
completes.

Unfortunately if someone sends SIGKILL to the process or exec sends
SIGKILL to the individual thread then PTRACE_DETACH will fail.

At which point you can use waitpid to reap the zombie and detach
from the thread.

So I think the forever-sleeping can be fixed with something as simple
as changing ResumeThread to say:

// Resumes a thread by detaching from it.
static bool ResumeThread(pid_t pid) {
  if (sys_ptrace(PTRACE_DETACH, pid, NULL, NULL) >= 0)
  	return true;
  /* Someone killed the thread? */
  return waitpid(pid, NULL, 0) == pid;
}

It almost certainly makes sense to fix PTRACE_DETACH in the kernel to
allow this case to work.  And odds are good that we could make that
change without breaking anyone.  So it is worth a try.

Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ