[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250224180301.GI8144@e132581.arm.com>
Date: Mon, 24 Feb 2025 18:03:01 +0000
From: Leo Yan <leo.yan@....com>
To: Mark Rutland <mark.rutland@....com>
Cc: Rob Herring <robh@...nel.org>, Will Deacon <will@...nel.org>,
Catalin Marinas <catalin.marinas@....com>,
Jonathan Corbet <corbet@....net>, Marc Zyngier <maz@...nel.org>,
Oliver Upton <oliver.upton@...ux.dev>,
Joey Gouly <joey.gouly@....com>,
Suzuki K Poulose <suzuki.poulose@....com>,
Zenghui Yu <yuzenghui@...wei.com>,
James Clark <james.clark@...aro.org>,
Anshuman Khandual <anshuman.khandual@....com>,
linux-arm-kernel@...ts.infradead.org,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, kvmarm@...ts.linux.dev
Subject: Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch
Record Buffer Extension (BRBE)
On Mon, Feb 24, 2025 at 04:05:43PM +0000, Mark Rutland wrote:
[...]
> > > > Just a minor concern. I don't see any handling for task migration.
> > > > E.g., for a task is migrated from one CPU to another CPU, I expect we
> > > > need to save and restore branch records based on BRBE injection. So
> > > > far, the driver simply invalidates all records.
> > > >
> > > > I think this topic is very likely discussed before. If this is the
> > > > case, please ignore my comment. Except this, the code looks good
> > > > to me.
> > >
> > > Not really discussed on the list, but that was present in v18 (though
> > > not functional because .sched_task() hook wasn't actually enabled) and
> > > Mark removed it. His work is here[1].The only comment was:
> > >
> > > Note: saving/restoring at context-switch doesn't interact well with
> > > event rotation (e.g. if filters change)
> >
> > In the brbe_enable() function, it "Merge the permitted branch filters
> > of all events". Based on current implementation, all events share the
> > same branch filter.
>
> Critically, the brbe_enable() function merges the filters of all
> *active* events which have been installed into hardware. It does not
> track all events which can be rotated, and the resulting filter is not
> the same -- it can change as a result of rotation.
In a perf session has multiple events, and events have different branch
filters, seems to me, a simple way is to return error for this case.
I would argue BRBE is an IP for recording branches per CPU wise, it does
not support recording for event wise. If we can unify branch filter
within a perf session, would this be much easier for handling?
> > When event rotation happens, if without context switch, in theory we
> > should can directly use the branch record (no invalidation, no injection)
> > for all events.
>
> No; that only works in *some* cases, and will produce incorrect results
> in others.
>
> For example, consider filtering. Imagine a PMU with a single counter,
> and two events, where event-A filters for calls-and-returns and event-B
> filters for calls-only. When switching from event-A to event-B, it's
> theoretically possible to keep the existing records around, knowing that
> the returns can be filtered out later. When switching from event-B to
> event-A we cannot keep the existing records, since there are gaps
> whenever a return should have been recorded.
Seems to me, the problem is not caused by event rotation. We need to
calculate a correct filter in the first place - the BRBE driver should
calculate a superset for all filters of events for a session. Then,
generate branch record based event's specific filter.
> There are a number of cases of that shape given the set of configurable
> filters. In theory it's possible to retain those in some cases, but I
> don't think that the complexity is justified.
>
> Similarly, whenever kernel branches are recorded it's necessary to drop
> the stale branches whenever branch recording is paused, as there's
> necessarily a blackout period and hence a gap in the records.
If we save BRBE record when a process is switched out and then restore
the record when a process is switched in, should we can keep a decent
branch record for performance profiling?
I understand it might be many things happen in the middle of a task
switching or migration, but it is fine for not recording branches during
the blackout period. The missed branch records are not very helpful for
forming a flow for a profiled program itself, especially, if we
consider we will optimize userspace program in many cases.
> Do you think that you have a case where losing branches across rotation
> *really* matters?
I agreed that event rotation case might be rare and complex. Please see
a comment below.
> > For a context-switch case, we need to save and re-inject branch record.
> > BRBE record sticks to a process context, no matter what events have been
> > enabled.
>
> I had originally wanted to keep per-event records around, but it doesn't
> work in all cases. One reason events get discarded at context-switch
> time is that CPU-bound events can sample branches, and would
> mis-attribute stale userspace branches to the wrong context when
> switching tasks. There are explicit comments about this in
> amd_pmu_brs_sched_task() and intel_pmu_lbr_sched_task().
>
> Given we discard records when reprogramming events, we *could* try to
> preserve events in some cases, but I suspect that as with the rotation
> case this'll be a lot of complexity for little gain. Note that as we
> discard events when enabling the PMU, we'd throw some task-bound records
> away anyway, and practically the gain would be limited to cpu-bound
> records.
>
> Do you have a reason why you think we *must* keep events around?
Here I am really concerned are cases when a process is preempted or
migrated. The driver doesn't save and restore branch records for these
cases, it just invalidates all records when a task is scheduled in.
As a result, if an event overflow is close to context switch, it is
likely to capture incomplete branch records. For a userspace-only
tracing, it is risk to capture empty branch record after preemption
and migrations.
Thanks,
Leo
Powered by blists - more mailing lists