[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b0580124d8916d523afb98979a24f8108b0e700a.camel@arm.com>
Date: Fri, 26 Apr 2024 11:11:32 +0000
From: Ben Gainey <Ben.Gainey@....com>
To: "ak@...ux.intel.com" <ak@...ux.intel.com>
CC: "alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>,
"linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"peterz@...radead.org" <peterz@...radead.org>, Mark Rutland
<Mark.Rutland@....com>, "mingo@...hat.com" <mingo@...hat.com>, James Clark
<James.Clark@....com>, "acme@...nel.org" <acme@...nel.org>,
"namhyung@...nel.org" <namhyung@...nel.org>, "jolsa@...nel.org"
<jolsa@...nel.org>, "will@...nel.org" <will@...nel.org>, "irogers@...gle.com"
<irogers@...gle.com>, "adrian.hunter@...el.com" <adrian.hunter@...el.com>,
"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [RFC PATCH v2 0/4] A mechanism for efficient support for
per-function metrics
On Tue, 2024-04-23 at 08:42 -0700, Andi Kleen wrote:
> > Cursory testing on a Xeon(R) W-2145 with a 300 *instruction* sample
> > window (with and without the patch) suggests this approach would
> > work
> > for some counters. Calculating branch miss rates for example
> > appears to
> > be correct when used with the instruction counter as the sampling
> > event,
> > or at least this approach correctly identifies which functions in
> > the
> > test benchmark are prone to poor predictability. On the other hand
> > the
> > combination cycle and instructions counter does not appear to
> > sample
> > correctly as a pair. With something like
> >
> > perf record -e '{cycles/period=999700,alt-
> > period=300/,instructions}:uS' ... benchmark
> >
> > I often see very large CPI, the same affect is observed without the
> > patch enabled. No idea whats going on there, so any insight
> > welcome...
>
> My guess would be that the PMI handler cleared L1 and there are
> stalls
> reloading the working set. You can check L1 miss events to confirm.
> Unfortunately with the period change it cannot use multi-record
> PEBS which would avoid the need for a PMI.
>
> -Andi
Hi Andi,
Spent a bit of time looking at this.
Comparing the L1 counters against the values from 'perf stat' doesn't
appear to show some obvious cause.
I think this is just a quirk specific to using the cycle counter as the
sampling event, and is not related to the alt-period, as the affect is
present even on an unpatched kernel.
There appears to be some non-linear increase in CPI (over the sample
data as a whole) for the smallest values of period, e.g. for
period=100, CPI=~450; perf stat says it should be ~2.5. Manual
inspection of the raw data with:
perf script -F event,period -i perf.data.100
Shows repeating pattern along the lines of:
cycles=450
instructions=1
...
The affect quickly decreases as the period increases, with period=750,
the CPI is <2x (vs perf stat).
When the events are swapped so that the sampling event is
`instructions` rather than `cycles`, the affect is very much
diminished/gone; at P=100 is see about 3.5x overhead (vs perf stat),
and at P=500 the overhead is about 1.5x.
When alt-period is used such that "period=$((1000000-$P)),alt-
period=$P", the affect is unchanged.
Regards
Ben
Powered by blists - more mailing lists