lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161129092520.GB3092@twins.programming.kicks-ass.net>
Date:   Tue, 29 Nov 2016 10:25:20 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     kan.liang@...el.com
Cc:     mingo@...hat.com, linux-kernel@...r.kernel.org, eranian@...gle.com,
        alexander.shishkin@...ux.intel.com, ak@...ux.intel.com,
        lukasz.odzioba@...el.com
Subject: Re: [PATCH] perf/x86: fix event counter update issue


So caveat that I'm ill and cannot think much..

On Mon, Nov 28, 2016 at 11:26:46AM -0800, kan.liang@...el.com wrote:

> Here, all the possible failure cases are listed.
> Terms:
>     - new: current PMU counter value which read from rdpmcl.
>     - prev: previous counter value which is stored in &hwc->prev_count.
>     - in PMI/not in PMI: the event update happens in PMI handler or not.

> Current code to calculate delta:
>     delta = (new << shift) - (prev << shift);
>     delta >>= shift;
> 
> Case A: Not in PMI.  new > prev. But delta is negative.
>    That's the failure case of Test 2.
>    delta is s64 type. new and prev are u64 type. If the new is big
>    enough, after doing left shift and sub, the bit 64 of the result may
>    still be 1.
>    After converting to s64, the sign flag will be set. Since delta is
>    s64 type, arithmetic right shift is applied, which copy the sign flag
>    into empty bit positions on the upper end of delta.
>    It can be fixed by adding the max count value.
> 
>    Here is the real data for test2 on KNL.
>    new = aea96e1927
>    prev = ffffff0000000001
>    delta = aea96e1927000000 - 1000000 = aea96e1926000000
>    aea96e1926000000 >> 24 = ffffffaea96e1926   <<  negative delta

How can this happen? IIRC the thing increments, we program a negative
value, and when it passes 0 we generate a PMI.

And note that we _ALWAYS_ set the IN bits, even for !sampling events.
Also note we set max_period to (1<<31) - 1, so we should never exceed 31
bits.


> Case B: In PMI. new > prev. delta is positive.
>    That's the failure case of Test 3.
>    The PMI is triggered by overflow. But there is latency between
>    overflow and PMI handler. So new has small amount.
>    Current calculation lose the max count value.

That doesn't make sense, per the 31bit limit.


> Case C: In PMI. new < prev. delta is negative.
>    The PMU counter may be start from a big value. E.g. the fixed period
>    is small.
>    It can be fixed by adding the max count value.

Doesn't make sense, how can this happen?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ