lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091119100056.GA5602@wotan.suse.de>
Date:	Thu, 19 Nov 2009 11:00:56 +0100
From:	Nick Piggin <npiggin@...e.de>
To:	Jan Beulich <JBeulich@...ell.com>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Arjan van de Ven <arjan@...radead.org>, tglx@...utronix.de,
	Shai Fultheim <shai@...lemp.com>,
	Ravikiran Thirumalai <kiran@...lex86.org>,
	linux-kernel@...r.kernel.org, hpa@...or.com
Subject: Re: [PATCH] x86: eliminate redundant/contradicting cache line size config options

On Thu, Nov 19, 2009 at 08:38:14AM +0000, Jan Beulich wrote:
> >>> Nick Piggin <npiggin@...e.de> 19.11.09 09:13 >>>
> >On Wed, Nov 18, 2009 at 08:52:40PM -0800, Arjan van de Ven wrote:
> >Basically what I think we should do is consider L1_CACHE_BYTES to be
> >*the* correct default value to use for 1) avoiding false sharing (which
> >seems to be the most common use), and 2) optimal and repeatable per-object
> >packing into cachelines (which is more of a micro-optimization to be
> >applied carefully to really critical structures).
> 
> But then this really shouldn't be called L1_CACHE_... Though I realize
> that the naming seems to already be broken - looking over the cache
> line specifiers for CPUID leaf 2, there's really no L1 with 128 byte lines,
> just two L2s.

Yes, I agree L1_CACHE is not the best name. In what situation would
you *only* care about L1 cache line size, without knowing any other
line sizes? IMO only in the case where you also know more details
about L1 cache like size and write some particular cache blocking or
algorithm like that. And we don't really do that in kernel, especially
not in generic code.

 
> One question however is whether e.g. cache line ping-pong between
> L3s is really costing that much on non-NUMA, as opposed to it
> happening between L1s.

Well I think we still need to work to minimise intra-chip bouncing
even though it is far cheaper than inter-chip. It is still costly
and probably costs more power too. And as core count continues to
increase, I think even intra-chip bouncing costs are going to become
important (8 core Nehalem I think already doesn't have a true
unified L3 cache with crossbars to each core but has 8 L3 caches
connected with ring busses).

I don't think it makes too much sense to add much complexity to
say "oh we don't care about bouncing between threads on core or
cores on chip" because I haven't seen anywhere we can get a
significant data size benefit, and it often slows down the straight
line performance too (eg. per-cpu variable can often be non atomic,
but when you even share it between threads on a core then you have
to start using atomics). 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ