[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.0.9999.0710261136500.11674@chino.kir.corp.google.com>
Date: Fri, 26 Oct 2007 11:45:17 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Lee Schermerhorn <Lee.Schermerhorn@...com>
cc: Christoph Lameter <clameter@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Andi Kleen <ak@...e.de>, Paul Jackson <pj@....com>,
linux-kernel@...r.kernel.org
Subject: Re: [patch 2/2] cpusets: add interleave_over_allowed option
On Fri, 26 Oct 2007, Lee Schermerhorn wrote:
> That's what my "cpuset-independent interleave" patch does. David
> doesn't like the "null node mask" interface because it doesn't work with
> libnuma. I plan to fix that, but I'm chasing other issues. I should
> get back to the mempol work after today.
>
Hacking and requiring an updated version of libnuma to allow empty
nodemasks to be passed is a poor solution; if mempolicy's are supposed to
be independent from cpusets, then what semantics does an empty nodemask
actually imply when using MPOL_INTERLEAVE? To me, it means the entire
set_mempolicy() should be a no-op, and that's exactly how mainline
currently treats it _as_well_ as libnuma. So justifying this change in
the man page is respectible, but passing an empty nodemask just doesn't
make sense.
> What I like about the cpuset independent interleave is that the "policy
> remap" when cpusets are changed is a NO-OP--no need to change the
> policy. Just as "preferred local" policy chooses the node where the
> allocation occurs, my cpuset independent interleave patch interleaves
> across the set of nodes available at the time of the allocation. The
> application has to specifically ask for this behavior by the null/empty
> nodemask or the TBD libnuma API. IMO, this is the only reasonable
> interleave policy for apps running in dynamic cpusets.
>
Passing empty nodemasks with MPOL_INTERLEAVE to set_mempolicy() is the
only reasonable way of specifying you want, at all times, to interleave
over all available nodes? I doubt it.
I personally prefer an approach where cpusets take the responsibility for
determining how policies change (they use set_mempolicy() anyway to effect
their mems boundaries) because it's cpusets that has changed the available
nodemask out from beneath the application. So instead of trying to create
a solution where cpusets impact mempolicies and mempolicies impact
cpusets, it should only be in a single direction. Cpusets change the
set of available nodes and should update the attached tasks' mempolicies
at the same time. That's the same as saying that cpusets should be built
on top of mempolicies, which they are, and shouldn't have any reverse
dependency.
> An aside: if David et al [at google] are using cpusets on fake numa for
> resource management [I don't know this is the case, but saw some
> discussions way back that indicate it might be?], then maybe this
> becomes less of an issue when control groups [a.k.a. containers] and
> memory resource controls come to fruition?
>
Completely irrelevant; I care about the interaction between cpusets and
mempolicies in mainline Linux.
David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists