linux-kernel - Re: [RFC] perf/x86: Fix a warning on x86_pmu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABPqkBSvcj1c5-SO4PQLc2KLLf7_0bW4ftUyvFobyd0OVAmLrA@mail.gmail.com>
Date:   Tue, 24 Nov 2020 00:19:34 -0800
From:   Stephane Eranian <eranian@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Namhyung Kim <namhyung@...nel.org>, Ingo Molnar <mingo@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        "H. Peter Anvin" <hpa@...or.com>, x86 <x86@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        John Sperbeck <jsperbeck@...gle.com>,
        "Lendacky, Thomas" <Thomas.Lendacky@....com>,
        Andi Kleen <ak@...ux.intel.com>
Subject: Re: [RFC] perf/x86: Fix a warning on x86_pmu_stop()

Hi,

Another remark on the PEBS drainage code, it seems to me like a test
is not quite correct:
intel_pmu_drain_pebs_nhm()
{
...
               if (p->status != (1ULL << bit)) {
                        for_each_set_bit(i, (unsigned long *)&pebs_status, size)
                                error[i]++;
                        continue;
                }

The kernel cannot disambiguate when 2+ PEBS counters overflow at the
same time. This is what the comment for this code suggests.
However, I see the comparison is done with the unfiltered p->status
which is a copy of  IA32_PERF_GLOBAL_STATUS at the time of
the sample. This register contains more than the PEBS counter overflow
bits. It also includes many other bits which could also be set.

Shouldn't this test use pebs_status instead (which covers only the
PEBS counters)?

          if (pebs_status != (1ULL << bit)) {
          }

Or am I missing something?
Thanks.


On Tue, Nov 24, 2020 at 12:09 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Tue, Nov 24, 2020 at 02:01:39PM +0900, Namhyung Kim wrote:
>
> > Yes, it's not about __intel_pmu_pebs_event().  I'm looking at
> > intel_pmu_drain_pebs_nhm() specifically.  There's code like
> >
> >         /* log dropped samples number */
> >         if (error[bit]) {
> >             perf_log_lost_samples(event, error[bit]);
> >
> >             if (perf_event_account_interrupt(event))
> >                 x86_pmu_stop(event, 0);
> >         }
> >
> >         if (counts[bit]) {
> >             __intel_pmu_pebs_event(event, iregs, base,
> >                            top, bit, counts[bit],
> >                            setup_pebs_fixed_sample_data);
> >         }
> >
> > There's a path to x86_pmu_stop() when an error bit is on.
>
> That would seem to suggest you try something like this:
>
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 31b9e58b03fe..8c6ee8be8b6e 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1945,7 +1945,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
>                 if (error[bit]) {
>                         perf_log_lost_samples(event, error[bit]);
>
> -                       if (perf_event_account_interrupt(event))
> +                       if (iregs && perf_event_account_interrupt(event))
>                                 x86_pmu_stop(event, 0);
>                 }
>