linux-kernel - RE: [PATCH] perf/x86: fix event counter update issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <37D7C6CF3E00A74B8858931C1DB2F077536A9F55@SHSMSX103.ccr.corp.intel.com>
Date:   Thu, 23 Feb 2017 16:14:11 +0000
From:   "Liang, Kan" <kan.liang@...el.com>
To:     Vince Weaver <vincent.weaver@...ne.edu>
CC:     Peter Zijlstra <peterz@...radead.org>,
        "Odzioba, Lukasz" <lukasz.odzioba@...el.com>,
        Stephane Eranian <eranian@...gle.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        "ak@...ux.intel.com" <ak@...ux.intel.com>
Subject: RE: [PATCH] perf/x86: fix event counter update issue


> 
> On Wed, 22 Feb 2017, Liang, Kan wrote:
> 
> > > So from what I understand, the issue is if we have an architecture
> > > with full- width counters and we trigger a x86_perf_event_update()
> > > when bit
> > > 47 is set?
> >
> > No. It related to the counter width. The number of bits we can use
> > should be
> > 1 bit less than the total width. Otherwise, there will be problem.
> > For big cores such as haswell, broadwell, skylake, the counter width is 48
> bit.
> > So we can only use 47 bits.
> > For Silvermont and KNL, the counter width is only 32 bit I think. So
> > we can only use 31 bits.
> 
> So on a machine with 48-bit counters I should just have a counting event
> that counts to somewhere above 0x8000 0000 0001 and it should show
> problems?
Yes

> Because I am unable to trigger this.
> 
> But I guess if anywhere along the line x86_perf_event_update() is run then
> you start over?
> 

Probably. It depends on the left.

> I noticed your original reproducer bound the event to a core, is that needed
> to trigger this?

I don't think it's needed. But I didn't try anything without bound.

> 
> Can it happen on a fixed event or only a genearl purpose event?

I think it can happens on both. Because fixed counter and GP counter have
same counter width and code path.

> 
> > > So if I have a test that runs in a loop for 2^48 retired
> > > instructions (which takes ~12 hours on a recent machine) and then
> > > reads the results, they might be wrong?
> >
> > It only needs several minutes to reproduce the issue on SLM/KNL.
> 
> Yes, but I only have machines with 48-bit counters.  So it's going to take
> 256 times as long as on a machine with 40-bit counters.
> 
> I have an assembly loop that can consistently generate 2 instructions/cycle
> (I'd be glad to hear suggestions for events that count faster) and on a
> broadwell-ep machine it still takes at least 7 hours or so to get up to
> 0x800000000000.

I think you may use MSR tool to write a big number into IA32_PMC0
during your test. 
The writable IA32_PMC0 alias is 0x4C1.


Thanks,
Kan