lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 22 Feb 2017 15:28:26 +0000
From:   "Liang, Kan" <kan.liang@...el.com>
To:     Vince Weaver <vincent.weaver@...ne.edu>,
        Peter Zijlstra <peterz@...radead.org>
CC:     "Odzioba, Lukasz" <lukasz.odzioba@...el.com>,
        Stephane Eranian <eranian@...gle.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        "ak@...ux.intel.com" <ak@...ux.intel.com>
Subject: RE: [PATCH] perf/x86: fix event counter update issue


> 
> On Mon, 5 Dec 2016, Peter Zijlstra wrote:
> 
> > ---
> > Subject: perf,x86: Fix full width counter, counter overflow
> > Date: Tue, 29 Nov 2016 20:33:28 +0000
> >
> > Lukasz reported that perf stat counters overflow is broken on KNL/SLM.
> >
> > Both these parts have full_width_write set, and that does indeed have
> > a problem. In order to deal with counter wrap, we must sample the
> > counter at at least half the counter period (see also the sampling
> > theorem) such that we can unambiguously reconstruct the count.
> 
> I know I'm a bit late to this issue, but I suddenly have PAPI users being very
> worried that their results are going to be wrong ad I hadn't heard about
> this until recently.
> 
> I'm trying to make a reproducer test and want to make sure I understand
> the issue.  (And I don't have any of the easy trigger hardware either.
> And what is SLM?  Silvermont?  

Yes.

> Are we really that short on Changelog space
> that we can't spell out the abbreviations to make things clear for non-Intel
> employees?)

Here is the original patch I posted for this issue, which include the test cases.
https://lkml.org/lkml/2016/11/28/540
After the discussion, my patch was discarded. Now, we use Peter's fix.

> 
> So from what I understand, the issue is if we have an architecture with full-
> width counters and we trigger a x86_perf_event_update() when bit
> 47 is set?

No. It related to the counter width. The number of bits we can use should be
1 bit less than the total width. Otherwise, there will be problem.
For big cores such as haswell, broadwell, skylake, the counter width is 48 bit.
So we can only use 47 bits.
For Silvermont and KNL, the counter width is only 32 bit I think. So we can only
use 31 bits.

> 
> So if I have a test that runs in a loop for 2^48 retired instructions (which
> takes ~12 hours on a recent machine) and then reads the results, they
> might be wrong?

It only needs several minutes to reproduce the issue on SLM/KNL.

Thanks,
Kan 

> 
> It sounds like this can also be triggered by a sampling event with a really
> long period, but I couldn't puzzle out from the Changelog exactly how to
> reproduce this (or even how serious the issue is).
> 
> Vince
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ