lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1205167173.5579.56.camel@localhost>
Date:	Mon, 10 Mar 2008 12:39:33 -0400
From:	Lee Schermerhorn <Lee.Schermerhorn@...com>
To:	David Rientjes <rientjes@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Paul Jackson <pj@....com>, Christoph Lameter <clameter@....com>,
	Andi Kleen <ak@...e.de>,
	Randy Dunlap <randy.dunlap@...cle.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [patch -mm v2] mempolicy: disallow static or relative flags
	for local preferred mode

On Sat, 2008-03-08 at 14:14 -0800, David Rientjes wrote:
> MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES don't mean anything for
> MPOL_PREFERRED policies that were created with an empty nodemask (for
> purely local allocations).  They'll never be invalidated because the
> allowed mems of a task changes or need to be rebound relative to a
> cpuset's placement.

Good.

> 
> Also fixes a bug identified by Lee Schermerhorn that disallowed empty
> nodemasks to be passed to MPOL_PREFERRED to specify local allocations.

I tested this patch atop David's other patches on 2-6.25-rc3-mm1 and I
agree that it fixes the regression with local allocations.  It also
preserves existing behavior for MPOL_DEFAULT.  

However, [and I'm going to sound like the "Nodemask Nazi" again, sorry]
it does change the behavior of MPOL_PREFERRED when one specifies a
non-empty nodemask with preferred/all node[s] outside of the
mems_allowed mask and neither MPOL_F_{STATIC|RELATIVE}_NODES are
specified.

Existing behavior is to return EINVAL in this case.  I had tried to
preserve this behavior with my
"silently-restrict-nodemask-to-allowed-nodes" patch with those ugly
"was_empty" and "is_empty" variables to capture the state of the
nodemask before and after we remove !allowed nodes.  I also tried to
preserve this behavior in the fix I had sent to the local alloc
regression, by passing a NULL nodemask pointer to the create op for
localalloc.

With this patch, we silently fall back to local allocation for
MPOL_PREFERRED.  This occurs because by the time mpol_new_preferred()
sees the nodemask, we may have masked off all nodes with the
current_mems_allowed mask.  I have a proposed patch below to address
this [atop this one].

Another comment in-line below.

Lee
> 
> Cc: Paul Jackson <pj@....com>
> Cc: Christoph Lameter <clameter@....com>
> Cc: Lee Schermerhorn <Lee.Schermerhorn@...com>
> Cc: Andi Kleen <ak@...e.de>
> Cc: Randy Dunlap <randy.dunlap@...cle.com>
> Signed-off-by: David Rientjes <rientjes@...gle.com>
> ---
>  Documentation/vm/numa_memory_policy.txt |   16 ++++++++++++++--
>  mm/mempolicy.c                          |   24 +++++++++++++++++-------
>  2 files changed, 31 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt
> --- a/Documentation/vm/numa_memory_policy.txt
> +++ b/Documentation/vm/numa_memory_policy.txt
> @@ -210,6 +210,12 @@ Components of Memory Policies
>  	    local allocation for a specific range of addresses--i.e. for
>  	    VMA policies.
>  
> +	    It is possible for the user to specify that local allocation is
> +	    always preferred by passing an empty nodemask with this mode.
> +	    If an empty nodemask is passed, the policy cannot use the
> +	    MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES flags described
> +	    below.
> +
>  	MPOL_INTERLEAVED:  This mode specifies that page allocations be
>  	interleaved, on a page granularity, across the nodes specified in
>  	the policy.  This mode also behaves slightly differently, based on
> @@ -259,7 +265,10 @@ Components of Memory Policies
>  	    occurs over that node.  If no nodes from the user's nodemask are
>  	    now allowed, the Default behavior is used.
>  
> -	    MPOL_F_STATIC_NODES cannot be used with MPOL_F_RELATIVE_NODES.
> +	    MPOL_F_STATIC_NODES cannot be combined with the
> +	    MPOL_F_RELATIVE_NODES flag.  It also cannot be used for
> +	    MPOL_PREFERRED policies that were created with an empty nodemask
> +	    (local allocation).
>  
>  	MPOL_F_RELATIVE_NODES:  This flag specifies that the nodemask passed
>  	by the user will be mapped relative to the set of the task or VMA's
> @@ -306,7 +315,10 @@ Components of Memory Policies
>  	    set of memory nodes allowed by the task's cpuset, as that may
>  	    change over time.
>  
> -	    MPOL_F_RELATIVE_NODES cannot be used with MPOL_F_STATIC_NODES.
> +	    MPOL_F_RELATIVE_NODES cannot be combined with the
> +	    MPOL_F_STATIC_NODES flag.  It also cannot be used for
> +	    MPOL_PREFERRED policies that were created with an empty nodemask
> +	    (local allocation).
>  
>  MEMORY POLICY APIs
>  
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -151,9 +151,7 @@ static int mpol_new_interleave(struct mempolicy *pol, const nodemask_t *nodes)
>  
>  static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes)
>  {
> -	if (nodes_empty(*nodes))
> -		return -EINVAL;
> -	pol->v.preferred_node = first_node(*nodes);
> +	pol->v.preferred_node = !nodes_empty(*nodes) ? first_node(*nodes) : -1;
>  	return 0;
>  }
>  
> @@ -176,10 +174,22 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags,
>  	pr_debug("setting mode %d flags %d nodes[0] %lx\n",
>  		 mode, flags, nodes ? nodes_addr(*nodes)[0] : -1);
>  
> -	if (nodes && nodes_empty(*nodes) && mode != MPOL_PREFERRED)
> -		return ERR_PTR(-EINVAL);
> -	if (mode == MPOL_DEFAULT)
> +	if (mode == MPOL_DEFAULT) {
> +		if (nodes && !nodes_empty(*nodes))
> +			return ERR_PTR(-EINVAL);
>  		return NULL;
> +	}

So, the only time we get a NULL 'nodes' pointer to mpol_new() is when we
pass MPOL_DEFAULT to mpol_shared_policy_init().  The syscall entries
always pass a non-null nodemask pointer to an on-stack nodemask_t.  I've
added a VM_BUG_ON() to assert this in the patch below, as the rest of
mpol_new() assumes a non-NULL 'nodes'.  Maybe should be just 'BUG_ON()'?

> +	/*
> +	 * MPOL_PREFERRED cannot be used with MPOL_F_STATIC_NODES or
> +	 * MPOL_F_RELATIVE_NODES if the nodemask is empty (local allocation).
> +	 * All other modes require a valid pointer to a non-empty nodemask.
> +	 */
> +	if (mode == MPOL_PREFERRED) {
> +		if (nodes_empty(*nodes) && ((flags & MPOL_F_STATIC_NODES) ||
> +					    (flags & MPOL_F_RELATIVE_NODES)))
> +			return ERR_PTR(-EINVAL);
> +	} else if (nodes_empty(*nodes))
> +		return ERR_PTR(-EINVAL);
>  	policy = kmem_cache_alloc(policy_cache, GFP_KERNEL);
>  	if (!policy)
>  		return ERR_PTR(-ENOMEM);
> @@ -250,7 +260,7 @@ static void mpol_rebind_preferred(struct mempolicy *pol,
>  	} else if (pol->flags & MPOL_F_RELATIVE_NODES) {
>  		mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes);
>  		pol->v.preferred_node = first_node(tmp);
> -	} else {
> +	} else if (pol->v.preferred_node != -1) {
>  		pol->v.preferred_node = node_remap(pol->v.preferred_node,
>  						   pol->w.cpuset_mems_allowed,
>  						   *nodes);

Here's a patch, atop the subject patch, that addresses the issues
discussed above.  Tested with the numactl regression "suite", my little
MPOL_DEFAULT error return test and a couple of ad hoc memtoy sessions in
and out of restricted cpusets.  All passed--once I remembered to exit
the restricted cpuset that didn't include nodes 0 & 1, used by the
regression test...

David:  if you agree with this, I can generate a combined patch with
your signoff and mine to replace the previous regression fix which I
believe Andrew has already added to -mm.  Or,

Andrew:  if you prefer, I can generate a patch atop the one you've
already added to the tree to accomplish the same thing.

Please advise...

Lee

-----------------------------------------------------------------------
Against:  2.6.25-rc3-mm1 atop "mempolicy:  disallow static or relative
flags for local preferred allocation - v2"

Restores error checking for MPOL_PREFERRED that specifies only
dis-allowed nodes.  Skips cpuset related mempolicy setup for 
local allocation.

Also, enforce mpol_new() assumption that 'nodes' arg is non-NULL
except for MPOL_DEFAULT.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@...com>

 mm/mempolicy.c |   47 +++++++++++++++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 16 deletions(-)

Index: linux-2.6.25-rc3-mm1/mm/mempolicy.c
===================================================================
--- linux-2.6.25-rc3-mm1.orig/mm/mempolicy.c	2008-03-10 10:25:58.000000000 -0400
+++ linux-2.6.25-rc3-mm1/mm/mempolicy.c	2008-03-10 12:12:19.000000000 -0400
@@ -158,7 +158,9 @@ static int mpol_new_interleave(struct me
 
 static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes)
 {
-	pol->v.preferred_node = !nodes_empty(*nodes) ? first_node(*nodes) : -1;
+	if (nodes && nodes_empty(*nodes))
+		return -EINVAL;
+	pol->v.preferred_node = nodes ? first_node(*nodes) : -1;
 	return 0;
 }
 
@@ -176,6 +178,7 @@ static struct mempolicy *mpol_new(unsign
 {
 	struct mempolicy *policy;
 	nodemask_t cpuset_context_nmask;
+	int localalloc = 0;
 	int ret;
 
 	pr_debug("setting mode %d flags %d nodes[0] %lx\n",
@@ -186,36 +189,48 @@ static struct mempolicy *mpol_new(unsign
 			return ERR_PTR(-EINVAL);
 		return NULL;
 	}
+	VM_BUG_ON(!nodes);
+
 	/*
 	 * MPOL_PREFERRED cannot be used with MPOL_F_STATIC_NODES or
 	 * MPOL_F_RELATIVE_NODES if the nodemask is empty (local allocation).
 	 * All other modes require a valid pointer to a non-empty nodemask.
 	 */
 	if (mode == MPOL_PREFERRED) {
-		if (nodes_empty(*nodes) && ((flags & MPOL_F_STATIC_NODES) ||
-					    (flags & MPOL_F_RELATIVE_NODES)))
-			return ERR_PTR(-EINVAL);
+		if (nodes_empty(*nodes)) {
+			if (((flags & MPOL_F_STATIC_NODES) ||
+			     (flags & MPOL_F_RELATIVE_NODES)))
+				return ERR_PTR(-EINVAL);
+			localalloc++;
+		}
 	} else if (nodes_empty(*nodes))
 		return ERR_PTR(-EINVAL);
 	policy = kmem_cache_alloc(policy_cache, GFP_KERNEL);
 	if (!policy)
 		return ERR_PTR(-ENOMEM);
 	atomic_set(&policy->refcnt, 1);
-	cpuset_update_task_memory_state();
-	if (flags & MPOL_F_RELATIVE_NODES)
-		mpol_relative_nodemask(&cpuset_context_nmask, nodes,
-				       &cpuset_current_mems_allowed);
-	else
-		nodes_and(cpuset_context_nmask, *nodes,
-			  cpuset_current_mems_allowed);
 	policy->policy = mode;
 	policy->flags = flags;
-	if (mpol_store_user_nodemask(policy))
-		policy->w.user_nodemask = *nodes;
-	else
-		policy->w.cpuset_mems_allowed = cpuset_mems_allowed(current);
+	if (!localalloc) {
+		/*
+		 * cpuset related setup doesn't apply to local allocation
+		 */
+		cpuset_update_task_memory_state();
+		if (flags & MPOL_F_RELATIVE_NODES)
+			mpol_relative_nodemask(&cpuset_context_nmask, nodes,
+					       &cpuset_current_mems_allowed);
+		else
+			nodes_and(cpuset_context_nmask, *nodes,
+				  cpuset_current_mems_allowed);
+		if (mpol_store_user_nodemask(policy))
+			policy->w.user_nodemask = *nodes;
+		else
+			policy->w.cpuset_mems_allowed =
+						cpuset_mems_allowed(current);
+	}
 
-	ret = mpol_ops[mode].create(policy, &cpuset_context_nmask);
+	ret = mpol_ops[mode].create(policy,
+				    localalloc ? NULL : &cpuset_context_nmask);
 	if (ret < 0) {
 		kmem_cache_free(policy_cache, policy);
 		return ERR_PTR(ret);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ