lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 29 Nov 2021 14:07:58 -0800
From:   Stephane Eranian <eranian@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, kim.phillips@....com,
        acme@...hat.com, jolsa@...hat.com, songliubraving@...com,
        mpe@...erman.id.au, maddy@...ux.ibm.com
Subject: Re: [PATCH v2 03/13] perf/x86/amd: add AMD Fam19h Branch Sampling support

Peter,

Back from a vacation break. Comments below.

On Thu, Nov 18, 2021 at 4:33 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Nov 18, 2021 at 01:20:09PM +0100, Peter Zijlstra wrote:
> > On Tue, Nov 16, 2021 at 11:23:39PM -0800, Stephane Eranian wrote:
>
> > > Ok, I made the changes you suggested. It looks closer to the way LBR is handled.
> > > However, this means that there is no path by which you can get to
> > > amd_pmu_disable_event()
> > > without having gone through amd_pmu_disable_all(). Is that always the
> > > case? And same thing
> > > on the enable side.
> >
> > So that's true for ->add() and ->del(), those cannot be called without
> > being wrapped in ->pmu_disable(), ->pmu_enable().
> >
> > There is however the ->stop() and ->start() usage for throttling, which
> > can stop an individual event (while leaving the event scheduled on the
> > PMU). Now, I think the ->stop() gets called with the PMU enabled, but
> > the ->start() is with it disabled again.
>
> I just looked, and the throttling depends on the PMU's PMI handler
> implementation, for Intel it will have the PMU disabled, for generic and
> AMD it has it enabled (see x86_pmu_handle_irq -- also these are really
> NMIs but lets not do a mass rename just now).
>
Yes, I see that too. It has to do with the __perf_event_overflow()
-> __perf_event_account_interrupt() code path which does not call
perf_pmu_disable().
And that's because it knows it is called from PMI and let's the PMI
code decide the state
of the PMU. Unfortunately, on AMD, the PMU is not stopped fully on PMI
and that is because
there is no centralized way of doing this, so you'd have 6 wrmsr x 2
to disable/enable. OTOH,
I don't see the point of monitoring in the PMI code. So there is a
discrepancy between Intel and
AMD here. I think we should fix it.

> > The ramification would be that we'd stop the event, but leave BRS
> > enabled for a throttled event. Which should be harmless, no?

Well, the risk here is that if BRS is left enabled, it may hold the
NMI for up to 16 taken branches.
If the associated event is disabled, then cpuc->events[idx] = NULL.
The BRS drain function checks
for that and does not capture any sample. The brs_drain() function is
called from the PMI handler if
cpuc->lbr_users > 0 which would be the case on the stop() path. I
think this is the only risk we have
on the throttling code path.

There would be no problem if we were to stop/start the PMU in the AMD
PMI handler.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ