linux-kernel - Re: [RFC] bpf: lbr: enable reading LBR from tracing bpf programs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <962EDD5A-1B35-4C7F-A0A1-3EBC32EE63AB@fb.com>
Date:   Wed, 18 Aug 2021 16:46:32 +0000
From:   Song Liu <songliubraving@...com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     "open list:BPF (Safe dynamic programs and tools)" 
        <bpf@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "acme@...nel.org" <acme@...nel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        Kernel Team <Kernel-team@...com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        "Like Xu" <like.xu@...ux.intel.com>,
        Alexey Budankov <alexey.budankov@...ux.intel.com>
Subject: Re: [RFC] bpf: lbr: enable reading LBR from tracing bpf programs

Hi Peter,

Thanks for you quick response!

> On Aug 18, 2021, at 2:15 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> 
> On Tue, Aug 17, 2021 at 06:29:37PM -0700, Song Liu wrote:
>> The typical way to access LBR is via hardware perf_event. For CPUs with
>> FREEZE_LBRS_ON_PMI support, PMI could capture reliable LBR. On the other
>> hand, LBR could also be useful in non-PMI scenario. For example, in
>> kretprobe or bpf fexit program, LBR could provide a lot of information
>> on what happened with the function.
>> 
>> In this RFC, we try to enable LBR for BPF program. This works like:
>>  1. Create a hardware perf_event with PERF_SAMPLE_BRANCH_* on each CPU;
>>  2. Call a new bpf helper (bpf_get_branch_trace) from the BPF program;
>>  3. Before calling this bpf program, the kernel stops LBR on local CPU,
>>     make a copy of LBR, and resumes LBR;
>>  4. In the bpf program, the helper access the copy from #3.
>> 
>> Please see tools/testing/selftests/bpf/[progs|prog_tests]/get_call_trace.c
>> for a detailed example. Not that, this process is far from ideal, but it
>> allows quick prototype of this feature.
>> 
>> AFAICT, the biggest challenge here is that we are now sharing LBR in PMI
>> and out of PMI, which could trigger some interesting race conditions.
>> However, if we allow some level of missed/corrupted samples, this should
>> still be very useful.
>> 
>> Please share your thoughts and comments on this. Thanks in advance!
> 
>> +int bpf_branch_record_read(void)
>> +{
>> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> +
>> +	intel_pmu_lbr_disable_all();
>> +	intel_pmu_lbr_read();
>> +	memcpy(this_cpu_ptr(&bpf_lbr_entries), cpuc->lbr_entries,
>> +	       sizeof(struct perf_branch_entry) * x86_pmu.lbr_nr);
>> +	*this_cpu_ptr(&bpf_lbr_cnt) = x86_pmu.lbr_nr;
>> +	intel_pmu_lbr_enable_all(false);
>> +	return 0;
>> +}
> 
> Urgghhh.. I so really hate BPF specials like this.

I don't really like this design either. But it does show that LBR can be
very useful in non-PMI scenario. 

> Also, the PMI race
> you describe is because you're doing abysmal layer violations. If you'd
> have used perf_pmu_disable() that wouldn't have been a problem.

Do you mean instead of disable/enable lbr, we disable/enable the whole 
pmu? 

> 
> I'd much rather see a generic 'fake/inject' PMI facility, something that
> works across the board and isn't tied to x86/intel.

How would that work? Do we have a function to trigger PMI from software, 
and then gather the LBR data after the PMI? This does sound like a much
cleaner solution. Where can I find code examples that fake/inject PMI?

There is another limitation right now: we need to enable LBR with a 
hardware perf event (cycles, etc.). However, unless we use the event for 
something else, it wastes a hardware counter. So I was thinking to allow
software event, i.e. dummy event, to enable LBR. Does this idea sound 
sane to you?

Thanks,
Song