[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20071025185506.8c373aa8.pj@sgi.com>
Date: Thu, 25 Oct 2007 18:55:06 -0700
From: Paul Jackson <pj@....com>
To: Christoph Lameter <clameter@....com>
Cc: rientjes@...gle.com, akpm@...ux-foundation.org, ak@...e.de,
Lee.Schermerhorn@...com, linux-kernel@...r.kernel.org
Subject: Re: [patch 2/2] cpusets: add interleave_over_allowed option
Christoph wrote:
> With that MPOL_INTERLEAVE would be context dependent and no longer
> needs translation. Lee had similar ideas. Lee: Could we make
> MPOL_INTERLEAVE generally cpuset context dependent?
Well ... MPOL_INTERLEAVE already is essentially cpuset relative.
So long as the cpuset size (number of allowed memory nodes) doesn't
change, whatever MPOL_INTERLEAVE you set is remapped whenever the
cpusets 'mems' changes, preserving the cpuset relative interleaving.
The problem, as David explains, comes when cpusets change sizes.
When the cpuset gets smaller, one can still do a pretty good job,
scrunching down the interleave nodes in proportion. But when the
cpuset gets larger, it's not clear how to convert a subset of a
smaller set, to an equivalent subset of a larger set.
The existing code handled this last case by saying screw it -- don't
expand the set of interleave nodes when the cpuset 'mems' grows.
David's new code handles this last case by adding a new per-cpuset
Boolean that adds a new alternative, forcing all the tasks using
MPOL_INTERLEAVE in that cpuset, anytime thereafter that the cpusets
'mems' changes, to get interleaved over the entire cpuset.
Now that I spell it out that way, I am having second thoughts about
this one. It's another special case palliative, given that we can't
give the user what they really want.
David - could you describe the real world situation in which you
are finding that this new 'interleave_over_allowed' option, aka
'memory_spread_user', is useful? I'm not always opposed to special
case solutions; but they do usually require special case needs to
justify them ;).
I suspect that the general case solution would require having the user
pass in two nodemasks, call them ALL and SUBSET, requesting that
relative to the ALL nodes, interleave be done on the SUBSET nodes.
That way, even if say the task happened to be running in a cpuset with
a -single- allowed memory node at the moment, it could express its user
memory interleave memory needs for the general case of any number of
nodes. Then for whatever nodes were currently allowed by the cpuset
to that task at any point, the nodes_remap() logic could be done to
derive from the ALL and SUBSET masks, and the current allowed mask,
what nodes to interleave that tasks user allocations over.
This would require a new set_mempolicy API, and might not be worth it.
If David has a compelling use case, it is simple enough that it
might well be worth doing, even though it's not the general case
solution.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@....com> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists