linux-kernel - Re: [patch 3/4] mempolicy: add MPOL_F_STATIC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.1.00.0802130113020.3936@chino.kir.corp.google.com>
Date:	Wed, 13 Feb 2008 01:36:25 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Paul Jackson <pj@....com>
cc:	Lee.Schermerhorn@...com, akpm@...ux-foundation.org,
	clameter@....com, ak@...e.de, linux-kernel@...r.kernel.org,
	mel@....ul.ie
Subject: Re: [patch 3/4] mempolicy: add MPOL_F_STATIC_NODES flag

On Wed, 13 Feb 2008, Paul Jackson wrote:

> The infamous unpublished (except to a few) patch I drafted on Christmas
> (Dec 25, 2007) basically added two new modes for how mempolicy
> nodemasks were to be resolved:
>  1) a static, no remap, mode, such as in this present patchset, and
>  2) a cpuset relative mode that correctly handled the case of cpusets
>     growing larger (increased numbers of nodes.)
> 
> I'd like to persuade you to add case (2) as well.  But I suspect,
> given that case (2) was never as compelling to you as it was to me
> in our December discussions, that I'll have little luck doing that.
> 

MPOL_F_STATIC_NODES already handles the second case because you can 
specify nodes that aren't currently accessible because of your cpuset in 
the hopes that eventually they will become accessible.  It's possible to 
pass a nodemask with all bits set so that when the cpuset's set of nodes 
expand, the mempolicy is effected over the intersection of the two on 
rebind.

Other than that, perhaps if you can elaborate more on what you imagined 
supporting as far as cpusets growing larger (or supporting node hotplug, 
which is the same type of problem), we can discuss that further before I 
resend my patches.

> Notes and questions on your current patchset:
> 
>  1) It looks like you -add- a second nodemask to struct mempolicy:
> 	    nodemask_t cpuset_mems_allowed; /* mempolicy relative to these nodes */
> 	    nodemask_t user_nodemask;       /* nodemask passed by user */
> 
>     I believe you can overlay these two nodemasks using a union, and avoid making
>     struct mempolicy bigger.
> 

Ahh, since policy->cpuset_mems_allowed is only meaningful in the 
non-MPOL_F_STATIC_NODES case, that probably will work.  For the purposes 
of this patchset, we can certainly do that.  I'm wondering whether future 
expansions will require them to be separated again, however.

>  2) Those 'mpol_mode()' wrappers on all those mempolicy->policy
>     evaluations look dangerous to me.  It looks like a code bug
>     waiting to happen.  Unless I miss my guess, if you forget one of
>     those wrappers (or someone making a future change misses one), no
>     one will notice until sometime at runtime, someone makes use of the
>     new fangled MPOL_F_STATIC_NODES mode in some new situation, and we
>     end up executing code that we didn't expect to execute just then.
> 
>     I urge you to reconsider, and keep it so that the 'policy' field of struct
>     mempolicy naturally evaluates to just the policy.  There should be just one
>     pair of places, on entry to, and exit from, the kernel, where the code is
>     aware of the need to pack the mode and the policy into a single word.
> 

Ok.

>  3) Does your patchset allow the user to specify a nodemask that includes nodes
>     outside of the current cpuset with a MPOL_F_STATIC_NODES mode?  At first
>     glance, I didn't see that it did, but I should have looked closer.  This
>     strikes me as an essential aspect of this mode.
> 

It does, and that's why I was a little curious about your second case at 
the beginning of this email.

The user's nodemask is always stored unaltered in policy->user_nodemask.  
In mpol_new(), we intersect the current accessible nodes with that 
nodemask to determine if there's even a mempolicy to effect.  If not, 
mpol_new() returns ERR_PTR(-EINVAL) and we fall back to the existing 
policy (if any) for the VMA or task.  Otherwise, we use the intersection 
of those two nodemasks.

In mpol_rebind_policy() with MPOL_F_STATIC_NODES, we always intersect 
policy->user_nodemask with the set of accessible nodes (nodemask_t 
*newmask).  So if we can now access nodes that we previously couldn't, 
they are now part of the interleave nodemask.  A similiar situation occurs 
when building the zonelist for the MPOL_BIND case and you can see the 
change I made to MPOL_PREFERRED in the incremental patch I sent you.

The only way that newly-accessible nodes will not become a part of the 
mempolicy nodemask is when the user's nodemask and the set of accessible 
nodes is disjoint when the policy was created.

It is arguable whether we want to support that behavior as well (and we 
definitely could do it, it's not out of the scope of mempolicies).  Lee 
had specific requirements of rejecting nodemasks that had no nodes in the 
intersection based on the current implementation, but we could continue 
discussing the possibility of setting up mempolicies that are uneffected 
when they are created and only become policy when they are rebound later.

		David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/