lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8738qx1brz.fsf@xmission.com>
Date:	Mon, 29 Jul 2013 11:06:24 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Tejun Heo <tj@...nel.org>
Cc:	Michal Hocko <mhocko@...e.cz>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	cgroups@...r.kernel.org, containers@...ts.linux-foundation.org,
	linux-kernel@...r.kernel.org, kent.overstreet@...il.com,
	Li Zefan <lizefan@...wei.com>,
	Glauber Costa <glommer@...il.com>,
	Johannes Weiner <hannes@...xchg.org>
Subject: Re: memcg creates an unkillable task in 3.2-rc2

Tejun Heo <tj@...nel.org> writes:

> Hey, Eric.
>
> On Mon, Jul 29, 2013 at 10:03:35AM -0700, Eric W. Biederman wrote:
>> So this is not a simple matter of a frozen task not dying when SIGKILL
>> is received.  For the most part not dying when SIGKILL is received seems
>> like correct behavior for a frozne task.  Certainly it is correct
>> behavior for any other signal.
>> 
>> The issue is that the tasks don't freeze or that when thawed the SIGKILL
>> is still ignored.  It seems a wake up is being missed in there somewhere.
>
> That's actually interesting and shouldn't be happening.  Can you
> please provide more data as to what's going on while freezing?  It's
> likely that the problem is not caused by freezer per-se, the task
> might be stuck elsewhere and just fails to reach the freezing point.

Barring some infrastructure noise what is happening is:

- Setup a hierarchy with memory and the freezer
  (disable kernel oom and have a process watch for oom).
- In that memory cgroup add a process with one thread per cpu.
- In one thread slowly allocate once persecond I think it is 16M of ram
  and mlock and dirty it (just to force the pages into ram and stay there).
- When oom is achieved loop:
  * attempt to freeze all of the tasks.
  * if frozen send every task SIGKILL, unfreeze, remove the directory in cgroupfs.

The log message I am seeing says that the freezing fails.

So I don't actually know what is delivering SIGKILL.  It may be the oom
situation is triguring it while we are attempting to freeze the tasks.

> Would it be possible for memcg and freezer to deadlock?  Note that
> while freezing is in progress, some tasks will enter freezer earlier
> than others (of course) and won't respond to anything.  If memcg adds
> wait dependency among the tasks being frozen, it'll surely deadlock.

There may be a livelock.  But I have been able to unstick the processes
by simply echo 0 > memory.oom_control.

There may be a race where a we haven't hit whatever is causing the
freezer to fail and a task is frozen SIGKILL is delivered.  The wakup is
ignored and then the unfreezing doesn't deliver it?

I need to explore some more.

>> A single unified hierarchy is a really nasty idea for the same set of
>> reasons. You have to recompile to disable a controller to see if it that
>> controller's bugs are what are causing problems on your production
>> system.  Compiles or even just a reboot is a very heavy hammer to ask
>> people to use when they are triaging a problem.
>
> For Nth time, unified hierarchy doesn't mean all controllers are
> enabled on all hierarchies or that controllers can't be bound and
> unbound dynamically.  Except for the removal of orthogonal
> hierarchies, things actually become a lot more dynamic.

Interesting.  So by unified hierarchy you just mean that the same
directory structure must exist for all mounts of cgroupfs?
If that is not it we can wait until Plumbers and hash it out in person.

All I am really concerned about right now is the ability to easily toss
out questionable controllers/subsystems without having to recompile or
reboot.

>> I am also seeing what looks like a leak somewhere in the cgroup code as
>> well.  After some runs of the same reproducer I get into a state where
>> after everything is clean up.  All of the control groups have been
>> removed and the cgroup filesystem is unmounted, I can mount a cgroup
>> filesystem with that same combindation of subsystems, but I can't mount
>> a cgroup filesystem with any of those subsystems in any other
>> combination.  So I am guessing that the superblock is from the original
>> mounting is still lingering for some reason.
>
> Hmmm... yeah, if there are cgroups with refs remaining, that'd happen.
> Note that AFAIU memcg keeps the cgroups hangling around until all the
> pages are gone from it, so it could just be that it's still draining
> which may take a long time.  Maybe dropping cache would work?

Good suggestion.  I will have to play with that.

I also saw one case where someone how the directory that described one
of these weird blocking tasks was removed.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ