lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 26 Jan 2019 01:41:04 +0000
From:   Roman Gushchin <guro@...com>
To:     Arkadiusz Miśkiewicz <a.miskiewicz@...il.com>
CC:     Tejun Heo <tj@...nel.org>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        Aleksa Sarai <asarai@...e.de>, Jay Kamat <jgkamat@...com>,
        Michal Hocko <mhocko@...e.com>,
        Johannes Weiner <hannes@...xchg.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: pids.current with invalid value for hours [5.0.0 rc3 git]

On Fri, Jan 25, 2019 at 08:47:57PM +0100, Arkadiusz Miśkiewicz wrote:
> On 25/01/2019 17:37, Tejun Heo wrote:
> > On Fri, Jan 25, 2019 at 08:52:11AM +0100, Arkadiusz Miśkiewicz wrote:
> >> On 24/01/2019 12:21, Arkadiusz Miśkiewicz wrote:
> >>> On 17/01/2019 14:17, Arkadiusz Miśkiewicz wrote:
> >>>> On 17/01/2019 13:25, Aleksa Sarai wrote:
> >>>>> On 2019-01-17, Arkadiusz Miśkiewicz <a.miskiewicz@...il.com> wrote:
> >>>>>> Using kernel 4.19.13.
> >>>>>>
> >>>>>> For one cgroup I noticed weird behaviour:
> >>>>>>
> >>>>>> # cat pids.current
> >>>>>> 60
> >>>>>> # cat cgroup.procs
> >>>>>> #
> >>>>>
> >>>>> Are there any zombies in the cgroup? pids.current is linked up directly
> >>>>> to __put_task_struct (so exit(2) won't decrease it, only the task_struct
> >>>>> actually being freed will decrease it).
> >>>>>
> >>>>
> >>>> There are no zombie processes.
> >>>>
> >>>> In mean time the problem shows on multiple servers and so far saw it
> >>>> only in cgroups that were OOMed.
> >>>>
> >>>> What has changed on these servers (yesterday) is turning on
> >>>> memory.oom.group=1 for all cgroups and changing memory.high from 1G to
> >>>> "max" (leaving memory.max=2G limit only).
> >>>>
> >>>> Previously there was no such problem.
> >>>>
> >>>
> >>> I'm attaching reproducer. This time tried on different distribution
> >>> kernel (arch linux).
> >>>
> >>> After 60s pids.current still shows 37 processes even if there are no
> >>> processes running (according to ps aux).
> >>
> >>
> >> The same test on 5.0.0-rc3-00104-gc04e2a780caf and it's easy to
> >> reproduce bug. No processes in cgroup but pids.current reports 91.
> > 
> > Can you please see whether the problem can be reproduced on the
> > current linux-next?
> > 
> >  git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> 
> I can reproduce on next (5.0.0-rc3-next-20190125), too:

How reliably you can reproduce it? I've tried to run your reproducer
several times with different parameters, but wasn't lucky so far.
What's yours cpu number and total ram size?

Can you, please, provide the corresponding dmesg output?

I've checked the code again, and my wild guess is that these missing
tasks are waiting (maybe hopelessly) for the OOM reaper. Dmesg output
might be very useful here.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ