linux-kernel - Re: [PATCH 1/2] perf/core: Adding capability to disable PMUs event multiplexing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKTKpr70=hFdwq43SBM-1jmbNxc1suyn3XouQhsj2m4tM+jeUg@mail.gmail.com>
Date:   Thu, 7 Nov 2019 07:45:07 -0800
From:   Ganapatrao Kulkarni <gklkml16@...il.com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Ganapatrao Prabhakerrao Kulkarni <gkulkarni@...vell.com>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "will@...nel.org" <will@...nel.org>,
        "corbet@....net" <corbet@....net>
Subject: Re: [PATCH 1/2] perf/core: Adding capability to disable PMUs event multiplexing

On Thu, Nov 7, 2019 at 6:52 AM Mark Rutland <mark.rutland@....com> wrote:
>
> On Wed, Nov 06, 2019 at 03:28:46PM -0800, Ganapatrao Kulkarni wrote:
> > Hi Peter, Mark,
> >
> > On Wed, Nov 6, 2019 at 3:28 AM Mark Rutland <mark.rutland@....com> wrote:
> > >
> > > On Wed, Nov 06, 2019 at 01:01:40AM +0000, Ganapatrao Prabhakerrao Kulkarni wrote:
> > > > When PMUs are registered, perf core enables event multiplexing
> > > > support by default. There is no provision for PMUs to disable
> > > > event multiplexing, if PMUs want to disable due to unavoidable
> > > > circumstances like hardware errata etc.
> > > >
> > > > Adding PMU capability flag PERF_PMU_CAP_NO_MUX_EVENTS and support
> > > > to allow PMUs to explicitly disable event multiplexing.
> > >
> > > Even without multiplexing, this PMU activity can happen when switching
> > > tasks, or when creating/destroying events, so as-is I don't think this
> > > makes much sense.
> > >
> > > If there's an erratum whereby heavy access to the PMU can lockup the
> > > core, and it's possible to workaround that by minimzing accesses, that
> > > should be done in the back-end PMU driver.
> >
> > As said in errata,  If there are heavy access to memory like stream
> > application running and along with that if PMU control registers are
> > also accessed frequently, then CPU lockup is seen.
>
> Ok. So the issue is the frequency of access to those registers.
>
> Which registers does that apply to?

The control register which are used to start, stop the counter and the
register which is used to set the event type.
>
> Is this the case for only reads, only writes, or both?

It is write  issue, the h/w block has limited write buffers and
overflow getting hardware in weird state, when memory transactions are
high.

>
> Does the frequency of access actually matter, or is is just more likely
> that we see the issue with a greater number of accesses? i.e the
> increased frequency increases the probability of hitting the issue.

This is one scenario.
Any higher access to PMU register, when memory is busy needs to be controlled.

>
> I'd really like a better description of the HW issue here.
>
> > I ran perf stat with 4 events of thuderx2 PMU as well as with 6 events
> > for stream application.
> > For 4 events run, there is no event multiplexing, where as for 6
> > events run the events are multiplexed.
> >
> > For 4 event run:
> > No of times pmu->add is called: 10
> > No of times pmu->del is called: 10
> > No of times pmu->read is called: 310
> >
> > For 6 events run:
> > No of times pmu->add is called: 5216
> > No of times pmu->del is called: 5216
> > No of times pmu->read is called: 5216
> >
> > Issue happens when the add and del are called too many times as seen
> > with 6 event case.
>
> Sure, but I can achieve similar by creating/destroying events in a loop.
> Multiplexing is _one_ way to cause this behaviour, but it's not the
> _only_ way.

I agree, there may be other use cases also, however i am trying to fix
the common use case.

>
> > The PMU hardware control registers are programmed when add and del
> > functions are called.
> > For pmu->read no issues since no h/w issue with the data path.
>
> As above, can you please describe the hardware conditions more
> thoroughly?
>
> > This is UNCORE driver, not sure context switch has any influence on this?
>
> I believe that today it's possible for this to happen for cgroup events,
> as non-sensical as it may be to have a cgroup-bound uncore PMU event.
>
> > Please suggest me, how can we fix this in back-end PMU driver without
> > any perf core help?
>
> In order to do so, I need a better explanation of the underlying
> hardware issue.

I will try to put more explanation to erratum, however please let me
know, if you have any specific questions.

>
> Thanks,
> Mark.

Thanks,
Ganapat