[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 10 Nov 2006 16:12:19 -0800
From: "Bela Lubkin" <blubkin@...are.com>
To: "Andi Kleen" <ak@...e.de>
Cc: <linux-kernel@...r.kernel.org>, <mingo@...e.hu>
Subject: RE: touch_cache() only touches two thirds
Andi Kleen wrote:
> "Bela Lubkin" <blubkin@...are.com> writes:
> >
> > /*
> > * Dirty a big buffer in a hard-to-predict (for the L2 cache) way. This
> > * is the operation that is timed, so we try to generate unpredictable
> > * cachemisses that still end up filling the L2 cache:
> > */
>
> The comment is misleading anyways. AFAIK several of the modern
> CPUs (at least K8, later P4s, Core2, POWER4+, PPC970) have prefetch
> predictors advanced enough to follow several streams forward and backwards
> in parallel.
>
> I hit this while doing NUMA benchmarking for example.
>
> Most likely to be really unpredictable you need to use a
> true RND and somehow make sure still the full cache range
> is covered.
The corrected code in <http://bugzilla.kernel.org/show_bug.cgi?id=7476#c4>
covers the full cache range. Granted that modern CPUs may be able to track
multiple simultaneous cache access streams: how many such streams are they
likely to be able to follow at once? It seems like going from 1 to 2 would
be a big win, 2 to 3 a small win, beyond that it wouldn't likely make much
incremental difference. So what do the actual implementations in the field
support?
The code (original and corrected) uses 6 simultaneous streams.
I have a modified version that takes a `ways' parameter to use an arbitrary
number of streams. I'll post that onto bugzilla.kernel.org.
>Bela<
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists