[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZJL346ZgQRcHNA7E@FVFF77S0Q05N>
Date: Wed, 21 Jun 2023 14:15:15 +0100
From: Mark Rutland <mark.rutland@....com>
To: Anshuman Khandual <anshuman.khandual@....com>
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
will@...nel.org, catalin.marinas@....com,
Mark Brown <broonie@...nel.org>,
James Clark <james.clark@....com>,
Rob Herring <robh@...nel.org>, Marc Zyngier <maz@...nel.org>,
Suzuki Poulose <suzuki.poulose@....com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
linux-perf-users@...r.kernel.org
Subject: Re: [PATCH V12 08/10] arm64/perf: Add struct brbe_regset helper
functions
Hi Anshuman,
Thanks, this is looking much better; I just a have a couple of minor comments.
With those fixed up:
Acked-by: Mark Rutland <mark.rutland@....com>
Mark.
On Thu, Jun 15, 2023 at 07:02:37PM +0530, Anshuman Khandual wrote:
> The primary abstraction level for fetching branch records from BRBE HW has
> been changed as 'struct brbe_regset', which contains storage for all three
> BRBE registers i.e BRBSRC, BRBTGT, BRBINF. Whether branch record processing
> happens in the task sched out path, or in the PMU IRQ handling path, these
> registers need to be extracted from the HW. Afterwards both live and stored
> sets need to be stitched together to create final branch records set. This
> adds required helper functions for such operations.
>
> Cc: Catalin Marinas <catalin.marinas@....com>
> Cc: Will Deacon <will@...nel.org>
> Cc: Mark Rutland <mark.rutland@....com>
> Cc: linux-arm-kernel@...ts.infradead.org
> Cc: linux-kernel@...r.kernel.org
> Tested-by: James Clark <james.clark@....com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@....com>
> ---
> drivers/perf/arm_brbe.c | 127 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 127 insertions(+)
>
> diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c
> index 4729cb49282b..f6693699fade 100644
> --- a/drivers/perf/arm_brbe.c
> +++ b/drivers/perf/arm_brbe.c
> @@ -44,6 +44,133 @@ static void select_brbe_bank(int bank)
> isb();
> }
>
> +static bool __read_brbe_regset(struct brbe_regset *entry, int idx)
> +{
> + entry->brbinf = get_brbinf_reg(idx);
> +
> + /*
> + * There are no valid entries anymore on the buffer.
> + * Abort the branch record processing to save some
> + * cycles and also reduce the capture/process load
> + * for the user space as well.
> + */
This comment refers to the process of handling multiple entries, though it's
only handling one entry, and I don't think we need to mention saving cycles here.
Could we please delete this comment entirely? The comment above
capture_brbe_regset() already explains that we read until the first invalid
entry.
> + if (brbe_invalid(entry->brbinf))
> + return false;
> +
> + entry->brbsrc = get_brbsrc_reg(idx);
> + entry->brbtgt = get_brbtgt_reg(idx);
> + return true;
> +}
> +
> +/*
> + * This scans over BRBE register banks and captures individual branch records
> + * [BRBSRC, BRBTGT, BRBINF] into a pre-allocated 'struct brbe_regset' buffer,
> + * until an invalid one gets encountered. The caller for this function needs
> + * to ensure BRBE is an appropriate state before the records can be captured.
> + */
Could we simplify this to:
/*
* Read all BRBE entries in HW until the first invalid entry.
*
* The caller must ensure that the BRBE is not concurrently modifying these
* entries.
*/
> +static int capture_brbe_regset(int nr_hw_entries, struct brbe_regset *buf)
> +{
> + int idx = 0;
> +
> + select_brbe_bank(BRBE_BANK_IDX_0);
> + while (idx < nr_hw_entries && idx < BRBE_BANK0_IDX_MAX) {
> + if (!__read_brbe_regset(&buf[idx], idx))
> + return idx;
> + idx++;
> + }
> +
> + select_brbe_bank(BRBE_BANK_IDX_1);
> + while (idx < nr_hw_entries && idx < BRBE_BANK1_IDX_MAX) {
> + if (!__read_brbe_regset(&buf[idx], idx))
> + return idx;
> + idx++;
> + }
> + return idx;
> +}
> +
> +/*
> + * This function concatenates branch records from stored and live buffer
> + * up to maximum nr_max records and the stored buffer holds the resultant
> + * buffer. The concatenated buffer contains all the branch records from
> + * the live buffer but might contain some from stored buffer considering
> + * the maximum combined length does not exceed 'nr_max'.
> + *
> + * Stored records Live records
> + * ------------------------------------------------^
> + * | S0 | L0 | Newest |
> + * --------------------------------- |
> + * | S1 | L1 | |
> + * --------------------------------- |
> + * | S2 | L2 | |
> + * --------------------------------- |
> + * | S3 | L3 | |
> + * --------------------------------- |
> + * | S4 | L4 | nr_max
> + * --------------------------------- |
> + * | | L5 | |
> + * --------------------------------- |
> + * | | L6 | |
> + * --------------------------------- |
> + * | | L7 | |
> + * --------------------------------- |
> + * | | | |
> + * --------------------------------- |
> + * | | | Oldest |
> + * ------------------------------------------------V
> + *
> + *
> + * S0 is the newest in the stored records, where as L7 is the oldest in
> + * the live records. Unless the live buffer is detected as being full
> + * thus potentially dropping off some older records, L7 and S0 records
> + * are contiguous in time for a user task context. The stitched buffer
> + * here represents maximum possible branch records, contiguous in time.
> + *
> + * Stored records Live records
> + * ------------------------------------------------^
> + * | L0 | L0 | Newest |
> + * --------------------------------- |
> + * | L0 | L1 | |
> + * --------------------------------- |
> + * | L2 | L2 | |
> + * --------------------------------- |
> + * | L3 | L3 | |
> + * --------------------------------- |
> + * | L4 | L4 | nr_max
> + * --------------------------------- |
> + * | L5 | L5 | |
> + * --------------------------------- |
> + * | L6 | L6 | |
> + * --------------------------------- |
> + * | L7 | L7 | |
> + * --------------------------------- |
> + * | S0 | | |
> + * --------------------------------- |
> + * | S1 | | Oldest |
> + * ------------------------------------------------V
> + * | S2 | <----|
> + * ----------------- |
> + * | S3 | <----| Dropped off after nr_max
> + * ----------------- |
> + * | S4 | <----|
> + * -----------------
> + */
> +static int stitch_stored_live_entries(struct brbe_regset *stored,
> + struct brbe_regset *live,
> + int nr_stored, int nr_live,
> + int nr_max)
> +{
> + int nr_move = min(nr_stored, nr_max - nr_live);
> +
> + /* Move the tail of the buffer to make room for the new entries */
> + memmove(&stored[nr_live], &stored[0], nr_move * sizeof(*stored));
> +
> + /* Copy the new entries into the head of the buffer */
> + memcpy(&stored[0], &live[0], nr_live * sizeof(*stored));
> +
> + /* Return the number of entries in the stitched buffer */
> + return min(nr_live + nr_stored, nr_max);
> +}
> +
> /*
> * Generic perf branch filters supported on BRBE
> *
> --
> 2.25.1
>
Powered by blists - more mailing lists