[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170601214339.GB8803@roeck-us.net>
Date: Thu, 1 Jun 2017 14:43:39 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Vovo Yang <vovoy@...gle.com>, Ingo Molnar <mingo@...nel.org>,
linux-kernel@...r.kernel.org
Subject: Re: Threads stuck in zap_pid_ns_processes()
On Thu, Jun 01, 2017 at 02:36:38PM -0500, Eric W. Biederman wrote:
> Guenter Roeck <linux@...ck-us.net> writes:
>
> > On Thu, Jun 01, 2017 at 12:08:58PM -0500, Eric W. Biederman wrote:
> >> Guenter Roeck <linux@...ck-us.net> writes:
> >> >
> >> > I think you nailed it. If I drop CLONE_NEWPID from the reproducer I get
> >> > a zombie process.
> >> >
> >> > I guess the only question left is if zap_pid_ns_processes() should (or could)
> >> > somehow detect that situation and return instead of waiting forever.
> >> > What do you think ?
> >>
> >> Any chance you can point me at the chromium code that is performing the
> >> ptrace?
> >>
> >> I want to conduct a review of the kernel semantics to see if the current
> >> semantics make it unnecessarily easy to get into hang situations. If
> >> the semantics make it really easy to get into a hang situation I want
> >> to see if there is anything we can do to delicately change the semantics
> >> to avoid the hangs without breaking existing userspace.
> >>
> > The internal bug should be accessible to you.
> >
> > https://bugs.chromium.org/p/chromium/issues/detail?id=721298&desc=2
> >
> > It has some additional information, and points to the following code in Chrome.
> >
> > https://cs.chromium.org/chromium/src/breakpad/src/client/linux/minidump_writer/linux_ptrace_dumper.cc?rcl=47e51739fd00badbceba5bc26b8abc8bbd530989&l=85
> >
> > With the information we have, I don't really have a good idea what we could or
> > should change in Chrome to make the problem disappear, so I just concluded that
> > we'll have to live with the forever-sleeping task.
>
> I believe I see what is happening. The code makes the assumption that a
> thread will stay stopped and will not go away once ptrace attach
> completes.
>
> Unfortunately if someone sends SIGKILL to the process or exec sends
> SIGKILL to the individual thread then PTRACE_DETACH will fail.
>
> At which point you can use waitpid to reap the zombie and detach
> from the thread.
>
> So I think the forever-sleeping can be fixed with something as simple
> as changing ResumeThread to say:
>
> // Resumes a thread by detaching from it.
> static bool ResumeThread(pid_t pid) {
> if (sys_ptrace(PTRACE_DETACH, pid, NULL, NULL) >= 0)
> return true;
> /* Someone killed the thread? */
> return waitpid(pid, NULL, 0) == pid;
> }
>
> It almost certainly makes sense to fix PTRACE_DETACH in the kernel to
> allow this case to work. And odds are good that we could make that
> change without breaking anyone. So it is worth a try.
>
Do I interpret this correctly as "the above code should work, but currently
doesn't" ?
Thanks,
Guenter
Powered by blists - more mailing lists