[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20091104090116.fd319d1d.kamezawa.hiroyu@jp.fujitsu.com>
Date: Wed, 4 Nov 2009 09:01:16 +0900
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To: David Rientjes <rientjes@...gle.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
minchan.kim@...il.com, vedran.furac@...il.com,
Hugh Dickins <hugh.dickins@...cali.co.uk>
Subject: Re: [RFC][-mm][PATCH 1/6] oom-killer: updates for classification of
OOM
On Tue, 3 Nov 2009 12:18:40 -0800 (PST)
David Rientjes <rientjes@...gle.com> wrote:
> On Mon, 2 Nov 2009, KAMEZAWA Hiroyuki wrote:
>
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> >
> > Rewrite oom constarint to be up to date.
> >
> > (1). Now, at badness calculation, oom_constraint and other information
> > (which is available easily) are ignore. Pass them.
> >
> > (2)Adds more classes of oom constraint as _MEMCG and _LOWMEM.
> > This is just a change for interface and doesn't add new logic, at this stage.
> >
> > (3) Pass nodemask to oom_kill. Now alloc_pages() are totally rewritten and
> > it uses nodemask as its argument. By this, mempolicy doesn't have its own
> > private zonelist. So, Passing nodemask to out_of_memory() is necessary.
> > But, pagefault_out_of_memory() doesn't have enough information. We should
> > visit this again, later.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> > ---
> > drivers/char/sysrq.c | 2 -
> > fs/proc/base.c | 4 +-
> > include/linux/oom.h | 8 +++-
> > mm/oom_kill.c | 101 +++++++++++++++++++++++++++++++++++++++------------
> > mm/page_alloc.c | 2 -
> > 5 files changed, 88 insertions(+), 29 deletions(-)
> >
> > Index: mmotm-2.6.32-Nov2/include/linux/oom.h
> > ===================================================================
> > --- mmotm-2.6.32-Nov2.orig/include/linux/oom.h
> > +++ mmotm-2.6.32-Nov2/include/linux/oom.h
> > @@ -10,23 +10,27 @@
> > #ifdef __KERNEL__
> >
> > #include <linux/types.h>
> > +#include <linux/nodemask.h>
> >
> > struct zonelist;
> > struct notifier_block;
> >
> > /*
> > - * Types of limitations to the nodes from which allocations may occur
> > + * Types of limitations to zones from which allocations may occur
> > */
> > enum oom_constraint {
> > CONSTRAINT_NONE,
> > + CONSTRAINT_LOWMEM,
> > CONSTRAINT_CPUSET,
> > CONSTRAINT_MEMORY_POLICY,
> > + CONSTRAINT_MEMCG
> > };
> >
> > extern int try_set_zone_oom(struct zonelist *zonelist, gfp_t gfp_flags);
> > extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
> >
> > -extern void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, int order);
> > +extern void out_of_memory(struct zonelist *zonelist,
> > + gfp_t gfp_mask, int order, nodemask_t *mask);
> > extern int register_oom_notifier(struct notifier_block *nb);
> > extern int unregister_oom_notifier(struct notifier_block *nb);
> >
> > Index: mmotm-2.6.32-Nov2/mm/oom_kill.c
> > ===================================================================
> > --- mmotm-2.6.32-Nov2.orig/mm/oom_kill.c
> > +++ mmotm-2.6.32-Nov2/mm/oom_kill.c
> > @@ -27,6 +27,7 @@
> > #include <linux/notifier.h>
> > #include <linux/memcontrol.h>
> > #include <linux/security.h>
> > +#include <linux/mempolicy.h>
> >
> > int sysctl_panic_on_oom;
> > int sysctl_oom_kill_allocating_task;
> > @@ -55,6 +56,8 @@ static int has_intersects_mems_allowed(s
> > * badness - calculate a numeric value for how bad this task has been
> > * @p: task struct of which task we should calculate
> > * @uptime: current uptime in seconds
> > + * @constraint: type of oom_kill region
> > + * @mem: set if called by memory cgroup
> > *
> > * The formula used is relatively simple and documented inline in the
> > * function. The main rationale is that we want to select a good task
> > @@ -70,7 +73,9 @@ static int has_intersects_mems_allowed(s
> > * of least surprise ... (be careful when you change it)
> > */
> >
> > -unsigned long badness(struct task_struct *p, unsigned long uptime)
> > +static unsigned long __badness(struct task_struct *p,
> > + unsigned long uptime, enum oom_constraint constraint,
> > + struct mem_cgroup *mem)
> > {
> > unsigned long points, cpu_time, run_time;
> > struct mm_struct *mm;
> > @@ -193,30 +198,68 @@ unsigned long badness(struct task_struct
> > return points;
> > }
> >
> > +/* for /proc */
> > +unsigned long global_badness(struct task_struct *p, unsigned long uptime)
> > +{
> > + return __badness(p, uptime, CONSTRAINT_NONE, NULL);
> > +}
>
> I don't understand why this is necessary, CONSTRAINT_NONE should be
> available to proc_oom_score() via linux/oom.h. It would probably be
> better to not rename badness() and use it directly.
>
> > +
> > +
> > /*
> > * Determine the type of allocation constraint.
> > */
> > -static inline enum oom_constraint constrained_alloc(struct zonelist *zonelist,
> > - gfp_t gfp_mask)
> > -{
> > +
> > #ifdef CONFIG_NUMA
> > +static inline enum oom_constraint guess_oom_context(struct zonelist *zonelist,
> > + gfp_t gfp_mask, nodemask_t *nodemask)
>
> Why is this renamed from constrained_alloc()? If the new code is really a
> guess, we probably shouldn't be altering the oom killing behavior to kill
> innocent tasks if it's wrong.
>
No reasons. This just comes from my modification history.
I'll revert this part.
> > +{
> > struct zone *zone;
> > struct zoneref *z;
> > enum zone_type high_zoneidx = gfp_zone(gfp_mask);
> > - nodemask_t nodes = node_states[N_HIGH_MEMORY];
> > + enum oom_constraint ret = CONSTRAINT_NONE;
> >
> > - for_each_zone_zonelist(zone, z, zonelist, high_zoneidx)
> > - if (cpuset_zone_allowed_softwall(zone, gfp_mask))
> > - node_clear(zone_to_nid(zone), nodes);
> > - else
> > + /*
> > + * In numa environ, almost all allocation will be against NORMAL zone.
> > + * But some small area, ex)GFP_DMA for ia64 or GFP_DMA32 for x86-64
> > + * can cause OOM. We can use policy_zone for checking lowmem.
> > + */
> > + if (high_zoneidx < policy_zone)
> > + return CONSTRAINT_LOWMEM;
> > + /*
> > + * Now, only mempolicy specifies nodemask. But if nodemask
> > + * covers all nodes, this oom is global oom.
> > + */
> > + if (nodemask && !nodes_equal(node_states[N_HIGH_MEMORY], *nodemask))
> > + ret = CONSTRAINT_MEMORY_POLICY;
> > + /*
> > + * If not __GFP_THISNODE, zonelist containes all nodes. And if
> > + * zonelist contains a zone which isn't allowed under cpuset, we assume
> > + * this allocation failure is caused by cpuset's constraint.
> > + * Note: all nodes are scanned if nodemask=NULL.
> > + */
> > + for_each_zone_zonelist_nodemask(zone,
> > + z, zonelist, high_zoneidx, nodemask) {
> > + if (!cpuset_zone_allowed_softwall(zone, gfp_mask))
> > return CONSTRAINT_CPUSET;
> > + }
>
> This could probably be written as
>
> int nid;
> if (nodemask)
> for_each_node_mask(nid, *nodemask)
> if (!__cpuset_node_allowed_softwall(nid, gfp_mask))
> return CONSTRAINT_CPUSET;
>
> and then you don't need the struct zoneref or struct zone.
IIUC, typical allocation under cpuset is nodemask=NULL.
We'll have to scan zonelist.
The cpuset doesn't use nodemask at calling alloc_pages() because of its
softwall and hardwall feature and its nature of hierarchy.
Thanks,
-Kame
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists