lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgx6VQW192hbUiZABmkmZjNDynH75OR=-wvg=un960nRA@mail.gmail.com>
Date:   Mon, 11 Oct 2021 13:33:21 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Dave Jones <davej@...emonkey.org.uk>,
        Linux Kernel <linux-kernel@...r.kernel.org>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Oleg Nesterov <oleg@...hat.com>
Subject: Re: Should EXIT_DEAD be visible to userspace ?

On Mon, Oct 11, 2021 at 12:40 PM Dave Jones <davej@...emonkey.org.uk> wrote:
>
> One of our users reported a crash in some userspace tooling this
> morning, which scrapes /proc/pid to gather stack traces, process states
> etc of everything running at the time.
>
> The crash occurred because it fell over an unexpected task state,
> which was 'X'.   According to the procps man-pages, this state should
> never be seen, but here it clearly was.

Heh.

> The kernel running at the time was kinda old (5.2) but I don't see much
> change in the EXIT_DEAD space that would explain something that got
> fixed subsequently.   It's also probably going to be difficult to
> reproduce unfortunately.
>
> So my question is, is procps wrong and code should expect to see X state
> processes in proc ?  The code in question is being hardened to handle
> unexpected inputs, but I'm curious if the kernel is leaking some state
> that it shouldn't.

My gut feel is that the man-pages have clearly been wrong - or at
least misleading - for at least the last couple of years (and possibly
longer), and this is the first report we've ever had of it actually
causing problems.

The docs *do* mention 'X'. Even if they say 'should never be seen',
it's not like it's not right there.

So we could either ask to just have the man-pages fixed to be a little
less strongly worded ("never" -> "seldom" or whatever). And apparently
procps is already getting fixed.

Or we could hide the 'X' state in newer kernels, and just call them
zombies to user space. We could literally just change the string from
"X (dead)" to "Z (dead)" and the "dead" part would still be there (and
different from "Z (zombie)").

And either way, it's likely not going to be something that people will
notice ever again. You update your system, and you wouldn't see the
problem, because whether the kernel was changed or not, procps was
updated.

And if the argument is that people didn't update procps, but *did*
update the kernel, then sure, that could avoid somebody hitting this
again, but that's where the "at least a couple of years and nobody has
noticed before" comes in.

So I can certainly take a patch that hides 'X', and we can even mark
it for stable.

But it feels like realistically nobody will actually care, and the
real fix is the one to procps, and that fix will make any kernel
change irrelevant (and possibly even a slight negative, since now
procps might report interesting cases?).

End result: if somebody cares enough and sends me a tested patch, I'll
apply it. But I personally wouldn't care much, and wouldn't push for
it unless we get somebody who does.

And I really think that "should never be seen" in the docs is just wrong.

Yes, we hold the task lock in task_state() when you look at
/proc/<pid>/status. So maybe it will never be seen there, because
maybe (I didn't check) we move from X->Z while holding the lock.

But other parts of /proc don't actually do that lock_task(), I think.
IOW, looking at /proc/<pid>/stat (which shows just the first letter of
the state), doesn't do that, I think. So it's not actually serialized
with the process going through its dying moments.

So I _think_ the "never" was always just "almost never".

In fact, take a look at commit ad86622b478e ("wait: swap EXIT_ZOMBIE
and EXIT_DEAD to hide EXIT_TRACE from user-space"). That 'X' has been
seen before. It's not great when it happens, but I think this is an
example of "the 'never' has never been true, and goes back to at least
2014".

.. and that 2014 was just when we happened on it before. The actual
issue of 'X' showing up looks like it probably predates it (and likely
goes back to before git history).

So I suspect that stray 'X' is not actually a regression. Just rare
enough to be _almost_ "never".

                     Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ