lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1187195028.5422.30.camel@localhost>
Date:	Wed, 15 Aug 2007 12:23:47 -0400
From:	Lee Schermerhorn <Lee.Schermerhorn@...com>
To:	"Serge E. Hallyn" <serge@...lyn.com>
Cc:	Christoph Lameter <clameter@....com>,
	Dhaval Giani <dhaval@...ux.vnet.ibm.com>, bob.picco@...com,
	nacc@...ibm.com, kamezawa.hiroyu@...fujitsu.com, mel@...net.ie,
	akpm@...ux-foundation.org, Balbir Singh <balbir@...ibm.com>,
	Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
	lkml <linux-kernel@...r.kernel.org>,
	ckrm-tech <ckrm-tech@...ts.sourceforge.net>
Subject: Re: Regression in 2.6.23-rc2-mm2, mounting cpusets causes a hang

On Wed, 2007-08-15 at 09:31 -0500, Serge E. Hallyn wrote:
> Quoting Lee Schermerhorn (Lee.Schermerhorn@...com):
> > On Tue, 2007-08-14 at 14:56 -0700, Christoph Lameter wrote:
> > > On Tue, 14 Aug 2007, Lee Schermerhorn wrote:
> > > 
> > > > > Ok then you did not have a NUMA system configured. So its okay for the 
> > > > > dummies to ignore the stuff. CONFIG_NODES_SHIFT is a constant and does not 
> > > > > change. The first bit is always set.
> > > > 
> > > > The first bit [node 0] is only set for the N_ONLINE [and N_POSSIBLE]
> > > > mask.  We could add the static init for the other masks, but since
> > > > non-numa platforms are going through the __build_all_zonelists, they
> > > > might as well set the MEMORY bits explicitly.  Or, maybe you'll
> > > > disagree ;-).
> > > 
> > > The bitmaps can be completely ignored if !NUMA.
> > > 
> > > In the non NUMA case we define
> > > 
> > > static inline int node_state(int node, enum node_states state)
> > > {
> > >         return node == 0;
> > > }
> > > 
> > > So its always true for node 0. The "bit" is set.
> > 
> > The issue is with the N_*_MEMORY masks.  They don't get initialized
> > properly because node_set_state() is a no-op if !NUMA.  So, where we
> > look for intersections with or where we AND with the N_*_MEMORY masks we
> > get the empty set.
> > 
> > > 
> > > We are trying to get cpusets to work with !NUMA?

Sounds reasonable.  

> > 
> > 
> > Well, yes.  In Serge's case, he's trying to use cpusets with !NUMA.
> > He'll have to comment on the reasons for that.  Looking at all of the
> 
> So I can lock a container to a cpu on a non-numa machine.
> 
> > #ifdefs and init/Kconfig, CPUSET does not depend on NUMA--only SMP and
> > CONTAINERS [altho' methinks CPUSET should select CONTAINERS rather than
> > depend on it...].  So, you can use cpusets to partition of cpus in
> > non-NUMA configs.
> > 
> > In the more general case, tho', I'm looking at all uses of the
> > node_online_map and for_each_online_node, for instances where they
> > should be replaced with one of the *_MEMORY masks.  IMO, generic code
> > that is compiled independent of any CONFIG option, like NUMA,  should
> > just work, independent of the config.  Currently, as Serge has shown,
> > this is not the case.  So, I think we should fix the *_MEMORY maps to be
> > correctly populated in both the NUMA and !NUMA cases.  A couple of
> > options:
> > 
> > 1) just use node_set() when populating the masks,
> > 
> > 2) initialize all masks to include at least cpu/node 0 in the !NUMA
> > case.
> > 
> > Serge chose #1 to fix his problem.  I followed his lead to fix the other
> > 2 places where node_set_state() was being used to initialize the NORMAL
> > memory node mask and the CPU node mask.  This will add a few unnecessary
> > instructions to !NUMA configs, so we could change to #2.
> > 
> > Thoughts?
> 
> Paul, is the mems stuff in cpusets only really useful for NUMA cases?
> (I think it is... but am not sure)  If so I suppose one alternative
> could be to just disable that when !NUMA.  But disabling cpusets when
> !NUMA is completely wrong.
> 
> I personally would think that 1) is still the best option.  Otherwise
> the action
> 
> 	echo $SOME_CPU > /cpusets/set1/cpu
> 	echo $SOME_CPU > /cpusets/set1/mems
> 
> works on a numa machine, and is wrong on a non-numa machine.  With
> option 1, the second part doesn't actually restrict the memory, but
> at least /cpusets/set1/mems exists and $SOME_CPU doesn't have to be 0 to
> be valid.

Well, you really shouldn't be writing cpu ids to the cpuset mems file.
Rather, it takes node ids.  And on !NUMA configs, only node 0 exists.

Can you actually write a !0 cpuid to the mems file with the current
option #1 patch [that uses node_set() to populate the node_states[]]?
It should allow something like:

	echo 0,1<and maybe others> >/cpusets/set1/mems

As long as one of the specified node ids has memory, it will silently
ignore any that don't.

If you're up for it, you could try the following patch to statically
initialize the node state masks, in place of the "option 1" patch.  Be
aware, tho', that I've only tested on my ia64 NUMA platform.  I did
compile it [page_alloc.c] without error under allnoconfig.

Lee

-----------------
PATCH	Initialize N_*_MEMORY and N_CPU masks for non-NUMA config

Against:  2.6.23-rc2-mm2

Statically initialize the N_*_MEMORY and N_CPU node state masks
for !NUMA configurations.  This static initialization is required
because the node_set_state() function becomes a no-op for !NUMA.
Other generic code assumes that these masks are set correctly.

Note that in NUMA configurations, these masks will be populated
correctly, so don't bother with static initialization.  No sense
in making assumptions that could be broken down the road, resulting
in extra work for someone to debug.  Unlikely, perhaps, but who
needs the aggravation...

Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@...com>

 mm/page_alloc.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Index: Linux/mm/page_alloc.c
===================================================================
--- Linux.orig/mm/page_alloc.c	2007-08-15 10:01:23.000000000 -0400
+++ Linux/mm/page_alloc.c	2007-08-15 10:05:41.000000000 -0400
@@ -52,7 +52,14 @@
  */
 nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
 	[N_POSSIBLE] = NODE_MASK_ALL,
-	[N_ONLINE] = { { [0] = 1UL } }
+	[N_ONLINE] = { { [0] = 1UL } },
+#ifndef CONFIG_NUMA
+	[N_NORMAL_MEMORY] = { { [0] = 1UL } },
+#ifdef CONFIG_HIGHMEM
+	[N_HIGH_MEMORY] = { { [0] = 1UL } },
+#endif
+	[N_CPU] = { { [0] = 1UL } },
+#endif	/* NUMA */
 };
 EXPORT_SYMBOL(node_states);
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ