lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 31 May 2024 14:01:08 +0100
From: Mark Rutland <mark.rutland@....com>
To: James Clark <james.clark@....com>,
	Peter Zijlstra <peterz@...radead.org>
Cc: Anshuman Khandual <anshuman.khandual@....com>,
	Mark Brown <broonie@...nel.org>, Rob Herring <robh@...nel.org>,
	Marc Zyngier <maz@...nel.org>,
	Suzuki Poulose <suzuki.poulose@....com>,
	Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	linux-perf-users@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
	will@...nel.org, catalin.marinas@....com
Subject: Re: [PATCH V17 0/9] arm64/perf: Enable branch stack sampling

On Thu, May 30, 2024 at 06:41:14PM +0100, Mark Rutland wrote:
> On Thu, May 30, 2024 at 10:47:34AM +0100, James Clark wrote:
> > On 05/04/2024 03:46, Anshuman Khandual wrote:
> > > ------------------ Possible 'branch_sample_type' Mismatch -----------------
> > > 
> > > Branch stack sampling attributes 'event->attr.branch_sample_type' generally
> > > remain the same for all the events during a perf record session.
> > > 
> > > $perf record -e <event_1> -e <event_2> -j <branch_filters> [workload]
> > > 
> > > event_1->attr.branch_sample_type == event_2->attr.branch_sample_type
> > > 
> > > This 'branch_sample_type' is used to configure the BRBE hardware, when both
> > > events i.e <event_1> and <event_2> get scheduled on a given PMU. But during
> > > PMU HW event's privilege filter inheritance, 'branch_sample_type' does not
> > > remain the same for all events. Let's consider the following example
> > > 
> > > $perf record -e cycles:u -e instructions:k -j any,save_type ls
> > > 
> > > cycles->attr.branch_sample_type != instructions->attr.branch_sample_type
> > > 
> > > Because cycles event inherits PERF_SAMPLE_BRANCH_USER and instruction event
> > > inherits PERF_SAMPLE_BRANCH_KERNEL. The proposed solution here configures
> > > BRBE hardware with 'branch_sample_type' from last event to be added in the
> > > PMU and hence captured branch records only get passed on to matching events
> > > during a PMU interrupt.
> > > 
> > 
> > Hi Anshuman,
> > 
> > Surely because of this example we should merge? At least we have to try
> > to make the most common basic command lines work. Unless we expect all
> > tools to know whether the branch buffer is shared between PMUs on each
> > architecture or not. The driver knows though, so can merge the settings
> > because it all has to go into one BRBE.
> 
> The difficulty here is that these are opened as independent events (not
> in the same event group), and so from the driver's PoV, this is no
> different two two users independently doing:
> 
> 	perf record -e event:u -j any,save_type -p ${SOME_PID}
> 
> 	perf record -e event:k -j any,save_type -p ${SOME_PID}
> 
> .. where either would be surprised to get the merged result.

I took a look at how x86 handles this, and it looks like they may have the
problem we'd like to avoid. AFAICT, intel_pmu_lbr_add() blats cpuc->br_sel with
the branch selection of the last event added, and 

So I took a look at what happens on my x86-64 desktop running v5.10.0-9-amd64
from Debian 11.

Running the following program:

| int main (int argc, char *argv[])
| {
|         for (;;) {
|                 asm volatile("" ::: "memory");
|         }
| 
|         return 0;
| }	

I set /proc/sys/kernel/perf_event_paranoid to 2 and started two independent
perf sessions:

	perf record -e cycles:u -j any -o perf-user.data -p 1320224

	sudo perf record -e cycles:k -j any -o perf-kernel.data -p 1320224

.. after ~10 seconds, I killed both sessions with ^C.

When i susbsequently do 'perf report -i perf-kernel.data, I see:

| Samples: 295  of event 'cycles:k', Event count (approx.): 295
| Overhead  Command  Source Shared Object  Source Symbol               Target Symbol  Basic Block Cycles
|   99.66%  loop     loop                  [k] main                    [k] main       -
|    0.34%  loop     [kernel.kallsyms]     [k] native_irq_return_iret  [k] main       -

.. where the user symbols are surprising.

Similarly for 'perf report -i perf-user.data', I see:

| Samples: 198K of event 'cycles:u', Event count (approx.): 198739
| Overhead  Command  Source Shared Object  Source Symbol           Target Symbol           Basic Block Cycles
|   99.99%  loop     loop                  [.] main                [.] main                -
|    0.00%  loop     [unknown]             [.] 0xffffffff87801007  [.] main                -
|    0.00%  loop     [unknown]             [.] 0xffffffff86e05626  [.] 0xffffffff86e05629  -
|    0.00%  loop     [unknown]             [.] 0xffffffff86e0563d  [.] 0xffffffff86e0c850  -
|    0.00%  loop     [unknown]             [.] 0xffffffff86e0c86f  [.] 0xffffffff86e6b3f0  -
|    0.00%  loop     [unknown]             [.] 0xffffffff86e0c884  [.] 0xffffffff86e11ed0  -
|    0.00%  loop     [unknown]             [.] 0xffffffff86e0c88a  [.] 0xffffffff86e13850  -
|    0.00%  loop     [unknown]             [.] 0xffffffff86e11eee  [.] 0xffffffff86e0c889  -
|    0.00%  loop     [unknown]             [.] 0xffffffff86e13885  [.] 0xffffffff86e13888  -
|    0.00%  loop     [unknown]             [.] 0xffffffff86e13889  [.] 0xffffffff86e138a1  -
|    0.00%  loop     [unknown]             [.] 0xffffffff86e138a9  [.] 0xffffffff86e6b320  -
|    0.00%  loop     [unknown]             [.] 0xffffffff86e138c3  [.] 0xffffffff86e6b3f0  -
|    0.00%  loop     [unknown]             [.] 0xffffffff86e6b33a  [.] 0xffffffff86e138ae  -
|    0.00%  loop     [unknown]             [.] 0xffffffff86e6b3fb  [.] 0xffffffff86e0c874  -
|    0.00%  loop     [unknown]             [.] 0xffffffff86ff6c91  [.] 0xffffffff87a01ca0  -
|    0.00%  loop     [unknown]             [.] 0xffffffff87a01ca0  [.] 0xffffffff87a01ca5  -
|    0.00%  loop     [unknown]             [.] 0xffffffff87a01ca5  [.] 0xffffffff87a01cb1  -
|    0.00%  loop     [unknown]             [.] 0xffffffff87a01cb5  [.] 0xffffffff86e05600  -

Where the unknown (kernel!) samples are surprising.

Peter, do you have any opinion on this?

My thinking is that the "last scheduled event branch selection wins"
isn't the behaviour we actually want, and either:

(a) Conflicting events shouldn't be scheduled concurrently (e.g. treat
    that like running out of counters).

(b) The HW filters should be configured to allow anything permited by
    any of the events, and SW filtering should remove the unexpected
    records on a per-event basis.

.. but I imagine (b) is hard maybe? I don't know if LBR tells you which
CPU mode the src/dst were in.

Mark.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ