linux-kernel - [BUG] perf_event: semantic of PERF_SAMPLE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABPqkBS_Og3Vbe90xwyzA8S5ftfF08gMtJdtpqQTHFOwm+CAkg@mail.gmail.com>
Date:	Thu, 25 Aug 2011 19:19:15 +0200
From:	Stephane Eranian <eranian@...gle.com>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	Peter Zijlstra <peterz@...radead.org>, mingo@...e.hu,
	Robert Richter <robert.richter@....com>,
	Vince Weaver <vweaver1@...s.utk.edu>
Subject: [BUG] perf_event: semantic of PERF_SAMPLE_READ unclear

Hi,

I was looking at the kernel code dealing with PERF_SAMPLE_READ. You use
this option if you want to capture the values of other events in your
event group
on overflow.

This is similar to what you can do with a read() on an event group
leader. If you've
setup PERF_FORMAT_READ, then you can read the values of the other events
in your event group.

The issue at stake here is what values for the other counters. In
particular, how
recent are those values? Ideally, you'd like those values to be as recent as the
value of the main event.

In the case of sampling, you'd like to capture the values of the other events at
the time of the overflow or very close to it.

In either case, you'd like to get a consistent view of the events,
i.e., take their
values as close as possible from each other.

In the case of read(), the values are all retrieved from the actual counters if
the event group is active. Thus, you get the most recent values
possible. If it's not
active, then it's been saved, and the SW counter represents the most recent
values.

In the case of sampling, however, it is not clear what you get.

The perf_output_read() routine does not read the actual counters.
Instead, it relies
on the SW counter, event->count,  updated via x86_perf_event_update()
who knows when.
I think this could be a problem as the 'snapshot' you're getting is
not really consistent.

I think the perf_output_read() function must read the actual counters
or force an update
of the SW counters before saving the counts into the buffer. Because
we are in the interrupt
handler, we are guaranteed to have the events in the actual counters.
But the difficulty is that
we cannot grab any locks, not sure we need one given the call path.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/