linux-kernel - Re: [PATCH v2 1/3] perf/core: Add a tracepoint for perf sampling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJN39ogefQc3i4qRuQg4HmMhqn-Ah1aHYTQ1=-fr0WSBhXiShw@mail.gmail.com>
Date:	Fri, 5 Aug 2016 10:22:08 -0700
From:	Brendan Gregg <bgregg@...flix.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Alexei Starovoitov <alexei.starovoitov@...il.com>,
	Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	linux-kernel@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
	Wang Nan <wangnan0@...wei.com>
Subject: Re: [PATCH v2 1/3] perf/core: Add a tracepoint for perf sampling

On Fri, Aug 5, 2016 at 3:52 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Thu, Aug 04, 2016 at 10:24:06PM -0700, Alexei Starovoitov wrote:
>> tracepoints are actually zero overhead already via static-key mechanism.
>> I don't think Peter's objection for the tracepoint was due to overhead.
>
> Almost 0, they still have some I$ footprint, but yes. My main worry is
> that we can feed tracepoints into perf, so having tracepoints in perf is
> tricky.

Coincidentally I$ footprint was my most recent use case for needing
this: I have an I$ busting workload, and wanting to profile
instructions at a very high rate to get a breakdown of I$ population.
(Normally I'd use I$ miss overflow, but none of our Linux systems have
PMCs: cloud.)

> I also don't much like this tracepoint being specific to the hrtimer
> bits, I can well imagine people wanting to do the same thing for
> hardware based samples or whatnot.

Sure, which is why I thought we'd have two in a perf category. I'm all
for PMCs events, even though we can't currently use them!

>
>> > The perf:perf_hrtimer probe point is also reading state mid-way
>> > through a function, so it's not quite as simple as wrapping the
>> > function pointer. I do like that idea, though, but for things like
>> > struct file_operations.
>
> So what additional state to you need?

I was pulling in regs after get_irq_regs(), struct perf_event *event
after it's populated. Not that hard to duplicate. Just noting it
didn't map directly to the function entry.

I wanted perf_event just for event->ctx->task->pid, so that a BPF
program can differentiate between it's samples and other concurrent
sessions.

(I  was thinking of changing my patch to expose pid_t instead of
perf_event, since I was noticing it didn't add many instructions.)

[...]
>> instead of adding a tracepoint to perf_swevent_hrtimer we can replace
>> overflow_handler for that particular event with some form of bpf wrapper.
>> (probably new bpf program type). Then not only periodic events
>> will be triggering bpf prog, but pmu events as well.
>
> Exactly.

Although the timer use case is a bit different, and is via
hwc->hrtimer.function = perf_swevent_hrtimer.

[...]
>> The question is what to pass into the
>> program to make the most use out of it. 'struct pt_regs' is done deal.
>> but perf_sample_data we cannot pass as-is, since it's kernel internal.
>
> Urgh, does it have to be stable API? Can't we simply rely on the kernel
> headers to provide the right structure definition?

For timer it can be: struct pt_regs, pid_t.

So that would restrict your BPF program to one timer, since if you had
two (from one pid) you couldn't tell them apart. But I'm not sure of a
use case for two in-kernel timers. If there were, we could also add
struct perf_event_attr, which has enough info to tell things apart,
and is already exposed to user space.

I haven't looked into the PMU arguments, but perhaps that could be:
struct pt_regs, pid_t, struct perf_event_attr.

Thanks,

Brendan