linux-kernel - Re: [RESEND PATCH 2/2] perf/x86: improve the event scheduling to avoid unnecessary pmu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABPqkBSnJcioAeppPXtURu9+qSFpompWMrs-A=FdD76a6-+S8A@mail.gmail.com>
Date:   Tue, 19 Apr 2022 14:18:18 -0700
From:   Stephane Eranian <eranian@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Wen Yang <wenyang@...ux.alibaba.com>,
        Wen Yang <simon.wy@...baba-inc.com>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        mark rutland <mark.rutland@....com>,
        jiri olsa <jolsa@...hat.com>,
        namhyung kim <namhyung@...nel.org>,
        borislav petkov <bp@...en8.de>, x86@...nel.org,
        "h. peter anvin" <hpa@...or.com>, linux-perf-users@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RESEND PATCH 2/2] perf/x86: improve the event scheduling to
 avoid unnecessary pmu_stop/start

Hi,

Going back to the original description of this patch 2/2, it seems the
problem was that you expected PINNED events to always remain in
the same counters. This is NOT what the interface guarantees. A pinned
event is guaranteed to either be on a counter or in error state if active.
But while active the event can change counters because of event scheduling
and this is fine. The kernel only computes deltas of the raw counter. If you
are using the read() syscall to extract a value, then this is totally
transparent
and you will see no jumps. If you are instead using RDPMC, then you cannot
assume the counter index of a pinned event remains the same. If you do, then
yes, you will see discrepancies in the count returned by RDPMC.  You cannot
just use RDPMC to read a counter from user space. You need kernel help.
The info you need is in the page you must mmap on the fd of the event. It
shows the current counter index of the event along with sequence number and
timing to help scale the count if necessary. This proper loop for
RDPMC is documented
in include/uapi/linux/perf_event.h inside the perf_event_mmap_page definition.

As for TFA, it is not clear to me why this is a problem unless you
have the RDPMC problem
I described above.

On Tue, Apr 19, 2022 at 1:57 PM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Tue, Apr 19, 2022 at 10:16:12PM +0800, Wen Yang wrote:
> > We finally found that TFA (TSX Force Abort) may affect PMC3's behavior,
> > refer to the following patch:
> >
> > 400816f60c54 perf/x86/intel:  ("Implement support for TSX Force Abort")
> >
> > When the MSR gets set; the microcode will no longer use PMC3 but will
> > Force Abort every TSX transaction (upon executing COMMIT).
> >
> > When TSX Force Abort (TFA) is allowed (default); the MSR gets set when
> > PMC3 gets scheduled and cleared when, after scheduling, PMC3 is
> > unused.
> >
> > When TFA is not allowed; clear PMC3 from all constraints such that it
> > will not get used.
> >
> >
> > >
> > > However, this patch attempts to avoid the switching of the pmu counters
> > > in various perf_events, so the special behavior of a single pmu counter
> > > will not be propagated to other events.
> > >
> >
> > Since PMC3 may have special behaviors, the continuous switching of PMU
> > counters may not only affects the performance, but also may lead to abnormal
> > data, please consider this patch again.
>
> I'm not following. How do you get abnormal data?
>
> Are you using RDPMC from userspace? If so, are you following the
> prescribed logic using the self-monitoring interface?