linux-kernel - Re: Update cacheline size on X86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 10 Oct 2008 19:45:31 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	Dave Jones <davej@...hat.com>, x86@...nel.org,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: Update cacheline size on X86_GENERIC

On Friday 10 October 2008 18:46, Andi Kleen wrote:
> Nick Piggin <nickpiggin@...oo.com.au> writes:
> > On Friday 10 October 2008 04:14, Dave Jones wrote:
> >> I just noticed that configuring a kernel to use CONFIG_X86_GENERIC
> >> (as is typical for a distro kernel) configures it to use a 128 byte
> >> cacheline size. This made sense when that was commonplace (P4 era) but
> >> current
> >> Intel, AMD and VIA cpus use 64 byte cachelines.
> >
> > I think P4 technically did have 64 byte cachelines, but had some adjacent
> > line prefetching.
>
> The "coherency unit" on P4, which is what matters for SMP alignment
> purposes to avoid false sharing, is 128 bytes.
>
> > And AFAIK core2 CPUs can do similar prefetching (but
> > maybe it's smarter and doesn't cause so much bouncing?).
>
> On Core2 the coherency unit is 64bytes.

OK.


> > Anyway, GENERIC kernel should run well on all architectures, and while
> > going too big causes slightly increased structures sometimes, going too
> > small could result in horrible bouncing.
>
> Exactly.
>
> That is it costs one percent or so on TPC, but I think the fix
> for that is just to analyze where the problem is and size those
> data structures based on the runtime cache size. Some subsystems
> like slab do this already.

Costs 1% on TPC? Is that 128 byte aligning data structures on
Core2, or 64 byte aligning them on P4 that costs the performance?


> TPC is a bit of a extreme case because it is so extremly cache bound.

Still, it is a good canary.


> Overall the memory impact of the cache padding is getting less over
> time because more and more data is moving into the per CPU data areas.

Right.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/