lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL_JsqKHGROJa1EW94iy1XzCadEst-hPWZY2BmxKgMB93nDp4w@mail.gmail.com>
Date: Wed, 5 Feb 2025 10:15:54 -0600
From: Rob Herring <robh@...nel.org>
To: James Clark <james.clark@...aro.org>, Mark Rutland <mark.rutland@....com>
Cc: linux-arm-kernel@...ts.infradead.org, linux-perf-users@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, 
	kvmarm@...ts.linux.dev, Will Deacon <will@...nel.org>, 
	Catalin Marinas <catalin.marinas@....com>, Jonathan Corbet <corbet@....net>, Marc Zyngier <maz@...nel.org>, 
	Oliver Upton <oliver.upton@...ux.dev>, Joey Gouly <joey.gouly@....com>, 
	Suzuki K Poulose <suzuki.poulose@....com>, Zenghui Yu <yuzenghui@...wei.com>, 
	Anshuman Khandual <anshuman.khandual@....com>
Subject: Re: [PATCH v19 11/11] perf: arm_pmuv3: Add support for the Branch
 Record Buffer Extension (BRBE)

On Wed, Feb 5, 2025 at 8:38 AM James Clark <james.clark@...aro.org> wrote:
> On 04/02/2025 3:03 pm, Rob Herring wrote:
> > On Tue, Feb 4, 2025 at 6:03 AM James Clark <james.clark@...aro.org> wrote:
> >> On 03/02/2025 5:58 pm, Rob Herring wrote:
> >>> On Mon, Feb 3, 2025 at 10:53 AM James Clark <james.clark@...aro.org> wrote:
> >>>> On 03/02/2025 12:43 am, Rob Herring (Arm) wrote:
> >>>>> From: Anshuman Khandual <anshuman.khandual@....com>
> >>>>>
> >>>>> The ARMv9.2 architecture introduces the optional Branch Record Buffer
> >>>>> Extension (BRBE), which records information about branches as they are
> >>>>> executed into set of branch record registers. BRBE is similar to x86's
> >>>>> Last Branch Record (LBR) and PowerPC's Branch History Rolling Buffer
> >>>>> (BHRB).
> >
> > [...]
> >
> >>>>> +     /*
> >>>>> +      * Require that the event filter and branch filter permissions match.
> >>>>> +      *
> >>>>> +      * The event and branch permissions can only mismatch if the user set
> >>>>> +      * at least one of the privilege branch filters in PERF_SAMPLE_BRANCH_PLM_ALL.
> >>>>> +      * Otherwise, the core will set the branch sample permissions in
> >>>>> +      * perf_copy_attr().
> >>>>> +      */
> >>>>> +     if ((event->attr.exclude_user != !(branch_type & PERF_SAMPLE_BRANCH_USER)) ||
> >>>>> +         (event->attr.exclude_kernel != !(branch_type & PERF_SAMPLE_BRANCH_KERNEL)) ||
> >>>>
> >>>> I don't think this one is right. By default perf_copy_attr() copies the
> >>>> exclude_ settings into the branch settings so this works, but if the
> >>>> user sets any _less_ permissive branch setting this fails. For example:
> >>>>
> >>>>     # perf record -j any,u -- true
> >>>>     Error:
> >>>>     cycles:PH: PMU Hardware or event type doesn't support branch stack
> >>>>     sampling.
> >>>>
> >>>> Here we want the default sampling permissions (exclude_kernel == 0,
> >>>> exclude_user == 0), but only user branch records, which doesn't match.
> >>>> It should be allowed because it doesn't include anything that we're not
> >>>> allowed to see.
> >>>
> >>> I know it is allowed(on x86), but why would we want that? If you do
> >>> something even more restricted:
> >>>
> >>> perf record -e cycles:k -j any,u -- true
> >>>
> >>> That's allowed on x86 and gives you samples with user addresses. But
> >>> all the events happened in the kernel. How does that make any sense?
> >>>
> >>> I suppose in your example, we could avoid attaching branch stack on
> >>> samples from the kernel. However, given how my example works, I'm
> >>> pretty sure that's not what x86 does.
> >>>
> >>> There's also combinations that are allowed, but record no samples.
> >>> Though I think that was with guest events. I've gone with reject
> >>> non-sense combinations as much as possible. We can easily remove those
> >>> restrictions later if needed. Changing the behavior later (for the
> >>> same configuration) wouldn't be good.
> >>>
> >>>
> >>
> >> Rejecting ones that produce no samples is fair enough, but my example
> >> does produce samples. To answer the question "why would we want that?",
> >> nothing major, but there are a few small reasons:
> >>
> >>    * Perf includes both user and kernel by default, so the shortest
> >>      command to only gather user branches doesn't work (-j any,u)
> >>    * The test already checks for branch stack support like this, so old
> >>      Perf test versions don't work
> >
> > I would be more concerned about this one except that *we* wrote that
> > test. (I'm not sure why we wrote a new test rather than adapting
> > record_lbr.sh...)
> >
>
> record_lbr.sh was added 6 months ago, test_brstack.sh 3 years ago so
> it's the other way around.

Sigh...

> Although record_lbr.sh also tests --call-graph and --stitch-lbr as well,
> so I think it's fine for test_brstack.sh to test only --branch-filter
> options at the lowest level.
>
> Looking at that test though I see there is a capability
> "/sys/devices/cpu/caps/branches". I'm wondering whether we should be
> adding that on the Arm PMU for BRBE?

I noticed that too. I suppose we should. Though I suppose that could
give weird results if userspace is expecting LBR. Adding that would
make record_lbr.sh run and then the LBR callgraph test is going to
fail.

> Ignoring the tests, the man pages (and some pages on the internet) give
> this example: "--branch-filter any_ret,u,k". This doesn't work either
> because it doesn't match the default exclude_hv option. It just seems a
> bit awkward and incompatible to me, for not much gain.
>
> >>    * You might only be optimising userspace, but still interested in the
> >>      proportion of time spent or particular place in the kernel
> >
> > How do you see that? It looks completely misleading to me. 'perf
> > report' seems to only list branch stack addresses in this case. There
> > doesn't seem to be any matching of the event address to branch stack
> > addresses.
> >
>
> Perf script will show everything with all it's various options, or
> --branch-history on perf report will show both too. Also there are tools
> other than Perf, AutoFDO seems like something that BRBE can be used with.
>
> >>    * Consistency with existing implementations and for people porting
> >>      existing tools to Arm
> >>    * It doesn't cost anything to support it (I think we just
> >>      only check if exclude_* is set rather than !=)
> >>    * Permissions checks should be handled by the core code so that
> >>      they're consistent
> >>    * What's the point of separate branch filters anyway if they always
> >>      have to match the event filter?
> >
> > IDK, I wish someone could tell me. I don't see the usecase for them
> > being mismatched.
> >
> > In any case, I don't care too much one way or the other what we do
> > here. If everyone thinks we should relax this, then that's fine with
> > me.
> >
>
> Seeing the branch history from userspace that led up to a certain thing
> in the kernel happening doesn't seem like that much of an edge case to
> me. If you always have to have both on then you lose the userspace
> branch history because the buffer isn't that big and gets overwritten.

Okay, let's drop this check...

> >> Some of these things could be fixed in Perf, but not in older versions.
> >> Even if we can't think of a real use case now, it doesn't sound like the
> >> driver should be so restrictive of an option that doesn't do any harm.
> >>
> >>>> This also makes the Perf branch test skip because it uses
> >>>> any,save_type,u to see if BRBE exists.
> >>>
> >>> Yes, I plan to update that if we keep this behavior.
> >>>
> >>>>> +         (!is_kernel_in_hyp_mode() &&
> >>>>> +          (event->attr.exclude_hv != !(branch_type & PERF_SAMPLE_BRANCH_HV))))
> >>>>> +             return false;
> >>>>> +
> >>>>> +     event->hw.branch_reg.config = branch_type_to_brbfcr(event->attr.branch_sample_type);
> >>>>> +     event->hw.extra_reg.config = branch_type_to_brbcr(event->attr.branch_sample_type);
> >>>>> +
> >>>>> +     return true;
> >>>>> +}
> >>>>> +
> >>>> [...]
> >>>>> +static const int brbe_type_to_perf_type_map[BRBINFx_EL1_TYPE_DEBUG_EXIT + 1][2] = {
> >>>>> +     [BRBINFx_EL1_TYPE_DIRECT_UNCOND] = { PERF_BR_UNCOND, 0 },
> >>>>
> >>>> Does the second field go into 'new_type'? They all seem to be zero so
> >>>> I'm not sure why new_type isn't ignored instead of having it mapped.
> >>>
> >>> Well, left over from when all the Arm specific types were supported.
> >>> So yeah, that can be simplified.
> >>>
> >>>>> +     [BRBINFx_EL1_TYPE_INDIRECT] = { PERF_BR_IND, 0 },
> >>>>> +     [BRBINFx_EL1_TYPE_DIRECT_LINK] = { PERF_BR_CALL, 0 },
> >>>>> +     [BRBINFx_EL1_TYPE_INDIRECT_LINK] = { PERF_BR_IND_CALL, 0 },
> >>>>> +     [BRBINFx_EL1_TYPE_RET] = { PERF_BR_RET, 0 },
> >>>>> +     [BRBINFx_EL1_TYPE_DIRECT_COND] = { PERF_BR_COND, 0 },
> >>>>> +     [BRBINFx_EL1_TYPE_CALL] = { PERF_BR_CALL, 0 },
> >>>>> +     [BRBINFx_EL1_TYPE_ERET] = { PERF_BR_ERET, 0 },
> >>>>> +     [BRBINFx_EL1_TYPE_IRQ] = { PERF_BR_IRQ, 0 },
> >>>>
> >>>> How do ones that don't map to anything appear in Perf? For example
> >>>> BRBINFx_EL1_TYPE_TRAP is missing, and the test that was attached to the
> >>>> previous versions fails because it doesn't see the trap that jumps to
> >>>> the kernel, but it does still see the ERET back to userspace:
> >>>>
> >>>>      [unknown]/trap_bench+0x20/-/-/-/0/ERET/-
> >>>>
> >>>> In older versions we'd also have BRBINFx_EL1_TYPE_TRAP mapping to
> >>>> PERF_BR_SYSCALL so you could see it go into the kernel before the return:
> >>>>
> >>>>      trap_bench+0x1C/[unknown]/-/-/-/0/SYSCALL/-
> >>>>      [unknown]/trap_bench+0x20/-/-/-/0/ERET/-
> >>>
> >>> My read of that was we should see a CALL in this case. Whether SVC
> >>> generates a TRAP or CALL depends on HFGITR_EL2.SVC_EL0 (table D18-2).
> >>> I assumed "SVC due to HFGITR_EL2.SVC_EL0" means when SVC_EL0 is set
> >>> (and set has additional conditions). We have SVC_EL0 cleared, so that
> >>> should be a CALL. Maybe the FVP has this wrong?
> >>>
> >>
> >> The test is doing this rather than a syscall:
> >>
> >>     asm("mrs %0, ID_AA64ISAR0_EL1" : "=r" (val));   /* TRAP + ERET */
> >>
> >> So I think trap is right. Whether that should be mapped to SYSCALL or
> >> some other branch type I don't know, but the point is that it's missing now.
> >
> > We aren't supporting any of the Arm specific traps/exceptions. One
> > reason is for consistency with x86 like you just argued for. The only
>
> Does x86 leave holes in the program flow though, or is it complete? IMO
> it makes it harder for tools to make sense of the branch buffer if there
> are things like an ERET with no previous trap to match it up to.

I'll have to test that. x86 has SYSRET for "syscall return". We added
ERET which maps to x86 interrupt return. So I guess x86 only records
syscalls and their returns. There's also "sw interrupt" on x86 which
gets mapped to PERF_BR_UNKNOWN. I don't think there's any way for us
to distinguish a syscall return from any other exception return.

> > exception types supported are syscall and IRQ. Part of the issue is
> > there is no userspace control over enabling all the extra Arm ones.
> > There's no way to say enable all branches except debug, fault, etc.
> > exceptions. If we want to support these, I think there should be user
> > control over enabling them. But that can come later if there's any
> > demand for them.
> >
> > Rob
>
> In this patchset we enable PERF_BR_IRQ with PERF_SAMPLE_BRANCH_ANY,
> without any way to selectively disable it. I would assume trap could be
> done with the same option.

If I was designing the interface, I would make PERF_BR_IRQ separately
controllable. But we're kind of stuck with what x86 did. I suppose we
could add a negative 'noirq' option.

Are you of the opinion that we should enable everything or some subset
of them? There's basically inst/data/algn faults, FIQ, SError, and
debug. The debug ones seem questionable to me, or at least ones you'd
want to opt-in for. For FIQ, if that's used by secure world, do we
want non-secure world recording when FIQs happen? Could the timing of
those be used maliciously?

> If we're filtering some of them out it might be worth documenting that
> "PERF_SAMPLE_BRANCH_ANY" doesn't actually mean 'any' branch type on Arm,
> and some types are recorded but discarded out before sending to userspace.
>
> There could be some confusion when there are partially filled or empty
> branch buffers, and the reason wouldn't be that there weren't any
> branches recorded, but they were all filtered out even with the 'any'
> option.

Fair enough. I think we need Mark to chime in here. He was questioning
the need for these.

Rob

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ