linux-kernel - Re: perf: bisected sampling bug in Linux 4.11-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1707141614220.28363@macbook-air>
Date:   Fri, 14 Jul 2017 16:14:26 -0400 (EDT)
From:   Vince Weaver <vincent.weaver@...ne.edu>
To:     Alexander Shishkin <alexander.shishkin@...ux.intel.com>
cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Stephane Eranian <eranian@...il.com>
Subject: Re: perf: bisected sampling bug in Linux 4.11-rc1

On Fri, 14 Jul 2017, Alexander Shishkin wrote:

> Vince Weaver <vincent.weaver@...ne.edu> writes:
> 
> > I was tracking down some regressions in my perf_event_test testsuite.
> > Some of the tests broke in the 4.11-rc1 timeframe.
> >
> > I've bisected one of them, this report is about
> > 	tests/overflow/simul_oneshot_group_overflow
> > This test creates an event group containing two sampling events, set
> > to overflow to a signal handler (which disables and then refreshes the 
> > event).
> >
> > On a good kernel you get the following:
> > 	Event perf::instructions with period 1000000
> > 	Event perf::instructions with period 2000000
> > 		fd 3 overflows: 946 (perf::instructions/1000000)
> > 		fd 4 overflows: 473 (perf::instructions/2000000)
> > 	Ending counts:
> > 		Count 0: 946379875
> > 		Count 1: 946365218
> >
> > With the broken kernels you get:
> > 	Event perf::instructions with period 1000000
> > 	Event perf::instructions with period 2000000
> > 		fd 3 overflows: 938 (perf::instructions/1000000)
> > 		fd 4 overflows: 318 (perf::instructions/2000000)
> > 	Ending counts:
> > 		Count 0: 946373080
> > 		Count 1: 653373058
> 
> I'm not sure I'm seeing it (granted, it's a friday evening): is it the
> difference in overflow counts?

It's two things.
	It's created an grouped event, with the two events both 
	perf::instructions.

	1.  The total count at the end should be the same for both
		(on the failing kernels it is not)
	2.  The overflow count for both events should be roughly
		total_events/sample_freq.
		(on the failing kernels it is not)

> Also, are they cpu or task bound?

The open looks like this:
	perf_event_open(&pe,0,-1,-1,0);

On the failing case, the group leader is pinned.

The source code for the test is here:
	https://github.com/deater/perf_event_tests/blob/master/tests/overflow/simul_oneshot_group_overflow.c

Vince