[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fU9ovvb-JopPqQfNaj6xtL=u_WZO-b56RdhBmUw4mY0ZA@mail.gmail.com>
Date: Thu, 20 Feb 2025 10:51:16 -0800
From: Ian Rogers <irogers@...gle.com>
To: Rajnesh Kanwal <rkanwal@...osinc.com>, "Liang, Kan" <kan.liang@...ux.intel.com>
Cc: linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org,
linux-perf-users@...r.kernel.org, adrian.hunter@...el.com,
alexander.shishkin@...ux.intel.com, ajones@...tanamicro.com,
anup@...infault.org, acme@...nel.org, atishp@...osinc.com,
beeman@...osinc.com, brauner@...nel.org, conor@...nel.org, heiko@...ech.de,
mingo@...hat.com, james.clark@....com, renyu.zj@...ux.alibaba.com,
jolsa@...nel.org, jisheng.teoh@...rfivetech.com, palmer@...belt.com,
will@...nel.org, kaiwenxue1@...il.com, vincent.chen@...ive.com
Subject: Re: [PATCH v2 1/7] perf: Increase the maximum number of samples to 256.
On Thu, Jan 16, 2025 at 3:10 PM Rajnesh Kanwal <rkanwal@...osinc.com> wrote:
>
> RISCV CTR extension support a maximum depth of 256 last branch records.
> The 127 entries limit results in corrupting CTR entries for RISC-V if
> configured to be 256 entries. This will not impact any other architectures
> as it is just increasing maximum limit of possible entries.
I wonder if rather than a constant this code should just use the auto
resizing hashmap code?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/hashmap.h
I assume the value of 127 comes from perf_event.h's PERF_MAX_STACK_DEPTH:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/perf_event.h#n1252
Perhaps these constants shouldn't exist. The perf-record man page mentions:
sysctl.kernel.perf_event_max_stack
which I believe gets a value from
/proc/sys/kernel/perf_event_max_stack, so maybe these should be
runtime determined constants rather than compile time.
Thanks,
Ian
> Signed-off-by: Rajnesh Kanwal <rkanwal@...osinc.com>
> ---
> tools/perf/util/machine.c | 21 ++++++++++++++-------
> 1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 27d5345d2b30..f2eb3c20274e 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2174,25 +2174,32 @@ static void save_iterations(struct iterations *iter,
> iter->cycles += be[i].flags.cycles;
> }
>
> -#define CHASHSZ 127
> -#define CHASHBITS 7
> -#define NO_ENTRY 0xff
> +#define CHASHBITS 8
> +#define NO_ENTRY 0xffU
>
> -#define PERF_MAX_BRANCH_DEPTH 127
> +#define PERF_MAX_BRANCH_DEPTH 256
>
> /* Remove loops. */
> +/* Note: Last entry (i==ff) will never be checked against NO_ENTRY
> + * so it's safe to have an unsigned char array to process 256 entries
> + * without causing clash between last entry and NO_ENTRY value.
> + */
> static int remove_loops(struct branch_entry *l, int nr,
> struct iterations *iter)
> {
> int i, j, off;
> - unsigned char chash[CHASHSZ];
> + unsigned char chash[PERF_MAX_BRANCH_DEPTH];
>
> memset(chash, NO_ENTRY, sizeof(chash));
>
> - BUG_ON(PERF_MAX_BRANCH_DEPTH > 255);
> + BUG_ON(PERF_MAX_BRANCH_DEPTH > 256);
>
> for (i = 0; i < nr; i++) {
> - int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
> + /* Remainder division by PERF_MAX_BRANCH_DEPTH is not
> + * needed as hash_64 will anyway limit the hash
> + * to CHASHBITS
> + */
> + int h = hash_64(l[i].from, CHASHBITS);
>
> /* no collision handling for now */
> if (chash[h] == NO_ENTRY) {
> --
> 2.34.1
>
Powered by blists - more mailing lists