linux-kernel - Re: [patch 2/2] cpusets: add interleave_over

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20071025181337.b27cd309.pj@sgi.com>
Date:	Thu, 25 Oct 2007 18:13:37 -0700
From:	Paul Jackson <pj@....com>
To:	David Rientjes <rientjes@...gle.com>
Cc:	akpm@...ux-foundation.org, ak@...e.de, clameter@....com,
	Lee.Schermerhorn@...com, linux-kernel@...r.kernel.org
Subject: Re: [patch 2/2] cpusets: add interleave_over_allowed option

I'm probably going to be ok with this ... after a bit.

1) First concern - my primary issue:

    One thing I really want to change, the name of the per-cpuset file
    that controls this option.  You call it "interleave_over_allowed".

    Take a look at the existing per-cpuset file names:

	    $ grep 'name = "' kernel/cpuset.c
	    .name = "cpuset",
	    .name = "cpus",
	    .name = "mems",
	    .name = "cpu_exclusive",
	    .name = "mem_exclusive",
	    .name = "sched_load_balance",
	    .name = "memory_migrate",
	    .name = "memory_pressure_enabled",
	    .name = "memory_pressure",
	    .name = "memory_spread_page",
	    .name = "memory_spread_slab",
	    .name = "cpuset",

    The name of every memory related option starts with "mem" or "memory",
    and the name of every memory interleave related option starts with
    "memory_spread_*".

    Can we call this "memory_spread_user" instead, or something else
    matching "memory_spread_*" ?

    The names of things in the public API's are a big issue of mine.

2) Second concern - lessor code clarity issue:

    The logic surrounding current_cpuset_interleaved_mems() seems a tad
    opaque to me.  It appears on the surface as if the memory policy code,
    in mm/mempolicy.c, is getting a nodemask from the cpuset code by
    calling this routine, as if there were an independent per-cpuset
    nodemask stating over what nodes to interleave for MPOL_INTERLEAVE.

    But all that is returned is either (1) an empty node mask or (2) the
    current tasks allowed cpu mask.  If an empty mask is returned, this
    tells the MPOL_INTERLEAVE code to use the mask the user specified in
    an earlier set_mempolicy MPOL_INTERLEAVE call.  If a non-empty mask
    is returned, then the previous user specified mask is ignored and
    that non-empty mask (just all the current cpusets allowed nodes) is
    used instead.

    Restating this in pseudo code, from your patch, the mempolicy.c
    MPOL_INTERLEAVE code to rebind memory policies after a cpuset
    changes reads:
	tmp = current_cpuset_interleaved_mems();
	if tmp empty:
		rebind over tmp (all the cpusets allowed nodes)
		break;
	rebind over the set_mempolicy MPOL_INTERLEAVE specified mask
	break;

    The above code is assymmetric, and the returning of a nodemask is
    an illusion, suggesting that cpusets might have an interleaved
    nodemask separate from the allowed memory nodemask.

    How about instead of your current_cpuset_interleaved_mems() routine
    that returns a nodemask, rather have a routine that returns a Boolean,
    indicating whether this new flag is set, used as in:
	if (cpuset_is_memory_spread_user())
		tmp = cpuset_current_mems_allowed();
	else
		nodes_remap(tmp, pol->v.nodes, *mpolmask, *newmask);
	pol->v.nodes = tmp;

    I'll wager this saves a few bytes of kernel text space as well.

3) Maybe I haven't had enough caffiene yet third issue:

    The existing kernel code for mm/mempolicy.c:mpol_rebind_policy()
    looks buggy to me.  The node_remap() call for the MPOL_INTERLEAVE
    case seems like it should come before, not after, updating mpolmask
    to the newmask.  Fixing that, and consolidating the multiple lines
    doing "*mpolmask = *newmask" for each case, into a single such line
    at the end of the switch(){} statement, results in the following
    patch.  Could you confirm my suspicions and push this one too.
    It should be a part of your patch set, so we don't waste Andrew's
    time resolving the inevitable patch collisions we'll see otherwise.

--- 2.6.23-mm1.orig/mm/mempolicy.c	2007-10-16 18:55:34.745039423 -0700
+++ 2.6.23-mm1/mm/mempolicy.c	2007-10-25 18:06:08.474742762 -0700
@@ -1741,14 +1741,12 @@ static void mpol_rebind_policy(struct me
 	case MPOL_INTERLEAVE:
 		nodes_remap(tmp, pol->v.nodes, *mpolmask, *newmask);
 		pol->v.nodes = tmp;
-		*mpolmask = *newmask;
 		current->il_next = node_remap(current->il_next,
 						*mpolmask, *newmask);
 		break;
 	case MPOL_PREFERRED:
 		pol->v.preferred_node = node_remap(pol->v.preferred_node,
 						*mpolmask, *newmask);
-		*mpolmask = *newmask;
 		break;
 	case MPOL_BIND: {
 		nodemask_t nodes;
@@ -1773,13 +1771,14 @@ static void mpol_rebind_policy(struct me
 			kfree(pol->v.zonelist);
 			pol->v.zonelist = zonelist;
 		}
-		*mpolmask = *newmask;
 		break;
 	}
 	default:
 		BUG();
 		break;
 	}
+
+	*mpolmask = *newmask;
 }
 
 /*


Thanks.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@....com> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/