[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.1.00.0802131314210.5914@chino.kir.corp.google.com>
Date:	Wed, 13 Feb 2008 13:35:41 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Paul Jackson <pj@....com>
cc:	Lee.Schermerhorn@...com, akpm@...ux-foundation.org,
	clameter@....com, ak@...e.de, linux-kernel@...r.kernel.org,
	mel@....ul.ie
Subject: Re: [patch 3/4] mempolicy: add MPOL_F_STATIC_NODES flag
On Wed, 13 Feb 2008, Paul Jackson wrote:
> Yes, if an application considers nodes to be interchangeable, I'm
> trying to avoid having that application -have- to know its current
> cpuset placement, for two reasons:
> 
>     For one thing, it's racey.  It's cpuset placement could change,
>     unbeknownst to it, between the time it queried it, and the time
>     that it issued the mbind or set_mempolicy call.
> 
>     For the other thing, it's not always possible.  If the application
>     is currently in a cpuset that is smaller than it's preferred
>     configuration, it would not be possible to express its preferred
>     memory policies using just the smaller number of memory nodes
>     allowed by its current cpuset placement.  How do you say "put
>     this on my third node" if you don't have a third node and you
>     can only speak of the nodes you currently have?
> 
So let's say, like my first example from the previous email, that you have 
MPOL_INTERLEAVE | MPOL_F_RELATIVE_NODES over nodes 3-4 and your cpuset's 
mems is only nodes 5-7.  This would interleave over no nodes.  Correct?
It seems like MPOL_F_RELATIVE_NODES is primarily designed to maintain a 
certain order among the nodes it effects the mempolicy over.  It comes 
with the premise that the task doesn't already know it's cpuset mems 
(otherwise, the current implementation without MPOL_F_STATIC_NODES would 
work fine for this) so it doesn't really care what nodes it allocates 
pages on, it just cares about the order.
This works for MPOL_PREFERRED and MPOL_BIND as well, right?
I don't understand the use case for this (at all), but if you have 
workloads that require this type of setting then I can implement this as 
part of my series.  I just want to confirm that there are real world cases 
backing this so that we don't have flags with highly highly specialized 
cornercases.
 [ If a user _does_ specify MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES
   as part of their syscall, then we'll simply return -EINVAL. ]
> > Well, I didn't cave on anything 
> 
> ;)   Your simple "ok" was ambiguous enough that we were able to
>      read into it whatever we wanted to.
> 
> But I've made my case on that issue (involving the separate or
> packed policy flag field).  So I probably won't say more, and
> I expect to live with whatever you choose, after any further
> input from Lee or others.
> 
Well, there's advantages and disadvantages to either approach.
My preference (both mode and flags stored in the same member of struct 
mempolicy):
   Advantages:
	- completely consistent with the userspace API of passing modes
	  and flags together in a pointer to an int, and
	- does not require additional formals to be added to several
	  functions, including functions outside mm/mempolicy.c.
   Disadvantage:
	- use of mpol_mode() throughout mm/mempolicy.c code to mask
	  off optional mode flags for conditionals or switch statements.
Your preference (separate mode and flags members in struct mempolicy):
   Advantages:
	- clearer implementation when dealing with modes: all existing
	  statements involving pol->policy can remain unchanged.
   Disadvantages:
	- requires additional formals to be added to several functions,
	  including functions outside mm/mempolicy.c, and
	- takes additional space in struct mempolicy (two bytes) which
	  could eventually be used for something else.
In both cases the testing of mode flags is the same as before:
	if (pol->policy & MPOL_F_STATIC_NODES) {
		...
	}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists