linux-kernel - Re: The performance and behaviour of the anti-fragmentation related patches

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0703021051580.3953@woody.linux-foundation.org>
Date:	Fri, 2 Mar 2007 11:03:58 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Mark Gross <mgross@...ux.intel.com>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Balbir Singh <balbir@...ibm.com>, Mel Gorman <mel@...net.ie>,
	npiggin@...e.de, clameter@...r.sgi.com, mingo@...e.hu,
	jschopp@...tin.ibm.com, arjan@...radead.org, mbligh@...igh.org,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: The performance and behaviour of the anti-fragmentation related
 patches

On Fri, 2 Mar 2007, Mark Gross wrote:
> 
> I think there will be more than just 2 dims per cpu socket on systems
> that care about this type of capability.

I agree. I think you'll have a nice mix of 2 and 4, although not likely a 
lot more. You want to have independent channels, and then within a channel 
you want to have as close to point-to-point as possible. 

But the reason that I think you're better off looking at a "node level" is 
that 

 (a) describing the DIMM setup is a total disaster. The interleaving is 
     part of it, but even in the absense of interleaving, we have so far 
     seen that describing DIMM mapping simply isn't a realistic thing to 
     be widely deplyed, judging by the fact that we cannot even get a 
     first-order approximate mapping for the ECC error events.

     Going node-level means that we just piggy-back on the existing node 
     mapping, which is a lot more likely to actually be correct and 
     available (ie you may not know which bank is bank0 and how the 
     interleaving works, but you usually *do* know which bank is connected 
     to which CPU package)

     (Btw, I shouldn't have used the word "die", since it's really about 
     package - Intel obviously has a penchant for putting two dies per 
     package)

 (b) especially if you can actually shut down the memory, going node-wide 
     may mean that you can shut down the CPU's too (ie per-package sleep). 
     I bet the people who care enough to care about DIMM's would want to 
     have that *anyway*, so tying them together simplifies the problem.

> BTW I hope we aren't talking past each other, there are low power states
> where the ram contents are persevered.

Yes. They are almost as hard to handle, but the advantage is that if we 
get things wrong, it can still work most of the time (ie we don't have to 
migrate everything off, we just need to try to migrate the stuff that gets 
*used* off a DIMM, and hardware will hopefully end up quiescing the right 
memory controller channel totally automatically, without us having to know 
the exact mapping or even having to 100% always get it 100% right).

With FBDIMM in particular, I guess the biggest power cost isn't actually 
the DRAM content, but just the controllers.

Of course, I wonder how much actual point there is to FBDIMM's once you 
have on-die memory controllers and thus the reason for deep queueing is 
basically gone (since you'd spread out the memory rather than having it 
behind a few central controllers).

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/