[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47B0ACBA.4090207@oracle.com>
Date: Mon, 11 Feb 2008 12:14:50 -0800
From: Randy Dunlap <randy.dunlap@...cle.com>
To: David Rientjes <rientjes@...gle.com>
CC: Andrew Morton <akpm@...ux-foundation.org>,
Paul Jackson <pj@....com>,
Christoph Lameter <clameter@....com>,
Lee Schermerhorn <Lee.Schermerhorn@...com>,
Andi Kleen <ak@...e.de>, linux-kernel@...r.kernel.org
Subject: Re: [patch 4/4 v2] mempolicy: update NUMA memory policy documentation
David Rientjes wrote:
> Updates Documentation/vm/numa_memory_policy.txt and
> Documentation/filesystems/tmpfs.txt to describe optional mempolicy mode
> flags.
>
> Cc: Paul Jackson <pj@....com>
> Cc: Christoph Lameter <clameter@....com>
> Cc: Lee Schermerhorn <Lee.Schermerhorn@...com>
> Cc: Andi Kleen <ak@...e.de>
> Cc: Randy Dunlap <randy.dunlap@...cle.com>
> Signed-off-by: David Rientjes <rientjes@...gle.com>
Acked-by: Randy Dunlap <randy.dunlap@...cle.com>
Thanks.
> ---
> Includes fixes to problems identified by Randy Dunlap.
>
> Documentation/filesystems/tmpfs.txt | 11 ++++++++
> Documentation/vm/numa_memory_policy.txt | 41 +++++++++++++++++++++++++++----
> 2 files changed, 47 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt
> --- a/Documentation/filesystems/tmpfs.txt
> +++ b/Documentation/filesystems/tmpfs.txt
> @@ -92,6 +92,17 @@ NodeList format is a comma-separated list of decimal numbers and ranges,
> a range being two hyphen-separated decimal numbers, the smallest and
> largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15
>
> +It is possible to specify a static NodeList by appending '=static' to
> +the memory policy mode in the mpol= argument. This will require that
> +tasks or VMA's restricted to a subset of allowed nodes are only allowed
> +to effect the memory policy over those nodes. No remapping of the
> +NodeList when the policy is rebound, which is the default behavior, is
> +allowed when '=static' is specified. For example:
> +
> +mpol=bind=static:NodeList will only allocate from each node in
> + the NodeList without remapping the
> + NodeList if the policy is rebound
> +
> Note that trying to mount a tmpfs with an mpol option will fail if the
> running kernel does not support NUMA; and will fail if its nodelist
> specifies a node which is not online. If your system relies on that
> diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt
> --- a/Documentation/vm/numa_memory_policy.txt
> +++ b/Documentation/vm/numa_memory_policy.txt
> @@ -135,9 +135,11 @@ most general to most specific:
>
> Components of Memory Policies
>
> - A Linux memory policy is a tuple consisting of a "mode" and an optional set
> - of nodes. The mode determine the behavior of the policy, while the
> - optional set of nodes can be viewed as the arguments to the behavior.
> + A Linux memory policy consists of a "mode", optional mode flags, and an
> + optional set of nodes. The mode determines the behavior of the policy,
> + the optional mode flags determine the behavior of the mode, and the
> + optional set of nodes can be viewed as the arguments to the policy
> + behavior.
>
> Internally, memory policies are implemented by a reference counted
> structure, struct mempolicy. Details of this structure will be discussed
> @@ -145,7 +147,12 @@ Components of Memory Policies
>
> Note: in some functions AND in the struct mempolicy itself, the mode
> is called "policy". However, to avoid confusion with the policy tuple,
> - this document will continue to use the term "mode".
> + this document will continue to use the term "mode". Since the mode and
> + optional mode flags are stored in the same struct mempolicy member
> + (specifically, pol->policy), you must use mpol_mode(pol->policy) to
> + access only the mode and mpol_flags(pol->policy) to access only the
> + flags. Any function with a formal of type enum mempolicy_mode only
> + refers to the mode.
>
> Linux memory policy supports the following 4 behavioral modes:
>
> @@ -231,6 +238,28 @@ Components of Memory Policies
> the temporary interleaved system default policy works in this
> mode.
>
> + Linux memory policy supports the following optional mode flag:
> +
> + MPOL_F_STATIC_NODES: This flag specifies that the nodemask passed by
> + the user should not be remapped if the task or VMA's set of accessible
> + nodes changes after the memory policy has been defined.
> +
> + Without this flag, anytime a mempolicy is rebound because of a
> + change in the set of accessible nodes, the node (Preferred) or
> + nodemask (Bind, Interleave) is remapped to the new set of
> + accessible nodes. This may result in nodes being used that were
> + previously undesired. With this flag, the policy is either
> + effected over the user's specified nodemask or the Default
> + behavior is used.
> +
> + For example, consider a task that is attached to a cpuset with
> + mems 1-3 that sets an Interleave policy over the same set. If
> + the cpuset's mems change to 3-5, the Interleave will now occur
> + over nodes 3, 4, and 5. With this flag, however, since only
> + node 3 is accessible from the user's nodemask, the "interleave"
> + only occurs over that node. If no nodes from the user's
> + nodemask are now accessible, the Default behavior is used.
> +
> MEMORY POLICY APIs
>
> Linux supports 3 system calls for controlling memory policy. These APIS
> @@ -251,7 +280,9 @@ Set [Task] Memory Policy:
> Set's the calling task's "task/process memory policy" to mode
> specified by the 'mode' argument and the set of nodes defined
> by 'nmask'. 'nmask' points to a bit mask of node ids containing
> - at least 'maxnode' ids.
> + at least 'maxnode' ids. Optional mode flags may be passed by
> + combining the 'mode' argument with the flag (for example:
> + MPOL_INTERLEAVE | MPOL_F_STATIC_NODES).
>
> See the set_mempolicy(2) man page for more details
>
--
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists