lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200901231515.37442.knikanth@suse.de>
Date:	Fri, 23 Jan 2009 15:15:36 +0530
From:	Nikanth Karthikesan <knikanth@...e.de>
To:	David Rientjes <rientjes@...gle.com>
Cc:	Evgeniy Polyakov <zbr@...emap.net>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Chris Snook <csnook@...hat.com>,
	Arve Hjønnevåg <arve@...roid.com>,
	Paul Menage <menage@...gle.com>,
	containers@...ts.linux-foundation.org
Subject: Re: [RFC] [PATCH] Cgroup based OOM killer controller

On Thursday 22 January 2009 15:57:19 David Rientjes wrote:
> On Thu, 22 Jan 2009, Evgeniy Polyakov wrote:
> > > In an exclusive cpuset, a task's memory is restricted to a set of mems
> > > that the administrator has designated.  If it is oom, the kernel must
> > > free memory on those nodes or the next allocation will again trigger an
> > > oom (leading to a needlessly killed task that was in a disjoint
> > > cpuset).
> > >
> > > Really.
> >
> > The whole point of oom-killer is to kill the most appropriate task to
> > free the memory. And while task is selected system-wide and some
> > tunables are added to tweak the behaviour local to some subsystems, this
> > cpuset feature is hardcoded into the selection algorithm.
>
> Of course, because the oom killer must be aware that tasks in disjoint
> cpusets are more likely than not to result in no memory freeing for
> current's subsequent allocations.
>

Yes, the problem is cpuset does not track the tasks which has allocated from 
this node - who has either moved or changed it set of allowable nodes. And 
because of that it does not limit oom killing to the tasks with in those tasks 
and could kill some innocent tasks at times.

As it is unable to take deterministic decision as memcg does, it plays with 
badness value and only suggests but does not restricts within those tasks that 
has to be killed.

This bug is present even without this patch.

> > And when some tunable starts doing own calculation, behaviour of this
> > hardcoded feature changes.
>
> Yes, it is possible to elevate oom_adj scores to override the cpuset
> preference.  That's actually intended since it is now possible for the
> administrator to specify that, against the belief of the kernel, that
> killing a task will free memory in these cpuset-constrained ooms.  That's
> probably because it has either been moved to a different cpuset or its set
> of allowable nodes is dynamic.
>

This patch adds one more easier way for the administrator to over-ride.

> > > Then the scope of this new cgroup is restricted to not being used with
> > > cpusets that could oom.
> >
> > These are perpendicular tasks - cpusets limit one area of the oom
> > handling, cgroup order - another. Some people needs cpusets, others want
> > cgroups. cpusets are not something exceptional so that only they have to
> > be taken into account when doing system-wide operation like OOM
> > condition handling.
>
> A cpuset is a cgroup.  If I am using cpusets, this patch fails to
> adequately allow me to describe my oom preferences for both
> cpuset-constrained ooms and global unconstrained ooms, which is a major
> drawback.
>

The current cpuset oom handling has to be fixed and the exact problem of 
killing innocent processes exists even without the oom-controller.

Thanks
Nikanth
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ