lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 9 Mar 2022 15:03:39 -0800
From:   Stephane Eranian <eranian@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, kim.phillips@....com,
        acme@...hat.com, jolsa@...hat.com, songliubraving@...com
Subject: Re: [PATCH v6 06/12] perf/x86/amd: add AMD branch sampling period adjustment

On Fri, Mar 4, 2022 at 7:45 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Wed, Feb 09, 2022 at 04:32:04PM +0100, Peter Zijlstra wrote:
> > On Tue, Feb 08, 2022 at 01:16:31PM -0800, Stephane Eranian wrote:
> > > Add code to adjust the sampling event period when used with the Branch
> > > Sampling feature (BRS). Given the depth of the BRS (16), the period is
> > > reduced by that depth such that in the best case scenario, BRS saturates at
> > > the desired sampling period. In practice, though, the processor may execute
> > > more branches. Given a desired period P and a depth D, the kernel programs
> > > the actual period at P - D. After P occurrences of the sampling event, the
> > > counter overflows. It then may take X branches (skid) before the NMI is
> > > caught and held by the hardware and BRS activates. Then, after D branches,
> > > BRS saturates and the NMI is delivered.  With no skid, the effective period
> > > would be (P - D) + D = P. In practice, however, it will likely be (P - D) +
> > > X + D. There is no way to eliminate X or predict X.
> > >
> > > Signed-off-by: Stephane Eranian <eranian@...gle.com>
> > > ---
> > >  arch/x86/events/core.c       |  7 +++++++
> > >  arch/x86/events/perf_event.h | 12 ++++++++++++
> > >  2 files changed, 19 insertions(+)
> > >
> > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > > index c2a890caeb0a..ed285f640efe 100644
> > > --- a/arch/x86/events/core.c
> > > +++ b/arch/x86/events/core.c
> > > @@ -1374,6 +1374,13 @@ int x86_perf_event_set_period(struct perf_event *event)
> > >         x86_pmu.set_topdown_event_period)
> > >             return x86_pmu.set_topdown_event_period(event);
> > >
> > > +   /*
> > > +    * decrease period by the depth of the BRS feature to get
> > > +    * the last N taken branches and approximate the desired period
> > > +    */
> > > +   if (has_branch_stack(event))
> > > +           period = amd_brs_adjust_period(period);
> > > +
> > >     /*
> > >      * If we are way outside a reasonable range then just skip forward:
> > >      */
> > > diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> > > index 3485a4cf0241..25b037b571e4 100644
> > > --- a/arch/x86/events/perf_event.h
> > > +++ b/arch/x86/events/perf_event.h
> > > @@ -1263,6 +1263,14 @@ static inline bool amd_brs_active(void)
> > >     return cpuc->brs_active;
> > >  }
> > >
> > > +static inline s64 amd_brs_adjust_period(s64 period)
> > > +{
> > > +   if (period > x86_pmu.lbr_nr)
> > > +           return period - x86_pmu.lbr_nr;
> > > +
> > > +   return period;
> > > +}
> >
> > This makes no sense to me without also enforcing that the event is in
> > fact that branch retired thing.
>
> So what are we going to do with all these patches? Note that I did pick
> them up for testing and I've fixed at least 2 build problems with them.
>
> But I still don't think they're actually completely sane. So there's the
> above issue, subtracting lbr_nr from a random event just makes no sense.


You are right. Initially, I had it such that only retired_branch_taken was
the only event possible. In that case, subtracting lbr_nr made sense.
Since, I have relaxed the event but it exposes this problem. I think
given how BRS works, I am okay restricting to retired_br_taken
because no matter what the hw is going to activate at P (period)
and wait for 16  taken branches before delivering the NMI. So if I
am sampling on cycles with P=1000000, then the NMI is delivered
at P + X + Z, where X = number of cycles elapsed for the 16 taken
branches (unpredictable) and Z the interrupt skid for NMI (which is
extremely big on AMD). With retired_branch_taken, that formula
becomes: P + 16 + Z, where Z is the number of taken branches
during the skid. But given BRS saturates when full, you do lose
the content because of the Z skid. My opinion is we keep the
lbr_nr subtraction and force event to be only retired_branch_taken.

> But there's also the whole exclusion thing, IIRC you're making it
> exclusive against other LBR users, but AFAICT having one LBR user active
> will completely screw over any other sampling event due to introducing
> these massive skids.


The skid is not massive compared to the actual skid of regular interrupt-based
sampling. You are looking at the time it takes to execute 16 taken branches
vs. 2000+ cycles for the NMI skid.  And this would happen only if the other
events overflow during that 16 taken branch window.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ