[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <A5F7CF90-27F9-476C-B87C-CAD2A6BE5DA4@fb.com>
Date: Thu, 19 Aug 2021 18:22:07 +0000
From: Song Liu <songliubraving@...com>
To: Peter Zijlstra <peterz@...radead.org>
CC: "open list:BPF (Safe dynamic programs and tools)"
<bpf@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"acme@...nel.org" <acme@...nel.org>,
"mingo@...hat.com" <mingo@...hat.com>,
Kernel Team <Kernel-team@...com>,
Kan Liang <kan.liang@...ux.intel.com>,
"Like Xu" <like.xu@...ux.intel.com>,
Alexey Budankov <alexey.budankov@...ux.intel.com>
Subject: Re: [RFC] bpf: lbr: enable reading LBR from tracing bpf programs
Hi Peter,
> On Aug 19, 2021, at 11:06 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Aug 19, 2021 at 04:46:20PM +0000, Song Liu wrote:
>>> void perf_inject_event(struct perf_event *event, struct pt_regs *regs)
>>> {
>>> struct perf_sample_data data;
>>> struct pmu *pmu = event->pmu;
>>> unsigned long flags;
>>>
>>> local_irq_save(flags);
>>> perf_pmu_disable(pmu);
>>>
>>> perf_sample_data_init(&data, 0, 0);
>>> /*
>>> * XXX or a variant with more _ that starts at the overflow
>>> * handler...
>>> */
>>> __perf_event_overflow(event, 0, &data, regs);
>>>
>>> perf_pmu_enable(pmu);
>>> local_irq_restore(flags);
>>> }
>>>
>>> But please consider carefully, I haven't...
>>
>> Hmm... This is a little weird to me.
>> IIUC, we need to call perf_inject_event() after the software event, say
>> a kretprobe, triggers. So it gonna look like:
>>
>> 1. kretprobe trigger;
>> 2. handler calls perf_inject_event();
>> 3. PMI kicks in, and saves LBR;
>
> This doesn't actually happen. I overlooked the fact that we need the PMI
> to fill out @data for us.
>
>> 4. after the PMI, consumer of LBR uses the saved data;
>
> Normal overflow handler will have data->br_stack set, but I now realize
> that the 'psuedo' code above will not get that. We need to somehow get
> the arch bits involved; again :/
>
>> However, given perf_inject_event() disables PMU, we can just save the LBR
>> right there? And it should be a lot easier? Something like:
>>
>> 1. kretprobe triggers;
>> 2. handler calls perf_snapshot_lbr();
>> 2.1 perf_pmu_disable(pmu);
>> 2.2 saves LBR
>> 2.3 perf_pmu_enable(pmu);
>> 3. consumer of LBR uses the saved data;
>>
>> What is the downside of this approach?
>
> It would be perf_snapshot_branch_stack() and would require a new
> (optional) pmu::method to set up the branch stack.
I guess it would look like:
diff --git i/include/linux/perf_event.h w/include/linux/perf_event.h
index fe156a8170aa3..af379b7f18050 100644
--- i/include/linux/perf_event.h
+++ w/include/linux/perf_event.h
@@ -514,6 +514,9 @@ struct pmu {
* Check period value for PERF_EVENT_IOC_PERIOD ioctl.
*/
int (*check_period) (struct perf_event *event, u64 value); /* optional */
+
+ int (*snapshot_branch_stack) (struct perf_event *event, /* TBD, maybe struct
+ perf_output_handle? */);
};
enum perf_addr_filter_action_t {
diff --git i/kernel/events/core.c w/kernel/events/core.c
index 2d1e63dd97f23..14aa5f7bccf1f 100644
--- i/kernel/events/core.c
+++ w/kernel/events/core.c
@@ -1207,6 +1207,19 @@ void perf_pmu_enable(struct pmu *pmu)
pmu->pmu_enable(pmu);
}
+int perf_snapshot_branch_stack(struct perf_event *event)
+{
+ struct pmu *pmu = event->pmu;
+ int ret;
+
+ if (!pmu->snapshot_branch_stack)
+ return -EOPNOTSUPP;
+ perf_pmu_disable(pmu);
+ ret = pmu->snapshot_branch_stack(event, ...);
+ perf_pmu_enable(pmu);
+ return 0;
+}
+
static DEFINE_PER_CPU(struct list_head, active_ctx_list);
>
> And if we're going to be adding new pmu::methods then I figure one that
> does the whole sample state might be more useful.
What do you mean by "whole sample state"? To integrate with exiting
perf_sample_data, like perf_output_sample()?
Thanks,
Song
Powered by blists - more mailing lists