lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fUy0QfmazuNq1yJzAxsuM7wD3zD=KAVMGHuw0wXvez1ww@mail.gmail.com>
Date: Wed, 21 May 2025 08:36:09 -0700
From: Ian Rogers <irogers@...gle.com>
To: Rajnesh Kanwal <rkanwal@...osinc.com>
Cc: ak@...ux.intel.com, "Liang, Kan" <kan.liang@...ux.intel.com>, 
	linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org, 
	linux-perf-users@...r.kernel.org, adrian.hunter@...el.com, 
	alexander.shishkin@...ux.intel.com, ajones@...tanamicro.com, 
	anup@...infault.org, acme@...nel.org, atishp@...osinc.com, 
	beeman@...osinc.com, brauner@...nel.org, conor@...nel.org, heiko@...ech.de, 
	mingo@...hat.com, james.clark@....com, renyu.zj@...ux.alibaba.com, 
	jolsa@...nel.org, jisheng.teoh@...rfivetech.com, palmer@...belt.com, 
	will@...nel.org, kaiwenxue1@...il.com, vincent.chen@...ive.com
Subject: Re: [PATCH v2 1/7] perf: Increase the maximum number of samples to 256.

On Wed, May 21, 2025 at 3:47 AM Rajnesh Kanwal <rkanwal@...osinc.com> wrote:
>
> On Thu, Apr 17, 2025 at 1:51 PM Rajnesh Kanwal <rkanwal@...osinc.com> wrote:
> >
> > + Adding Andi Kleen.
> >
> > On Thu, Feb 20, 2025 at 6:51 PM Ian Rogers <irogers@...gle.com> wrote:
> > >
> > > On Thu, Jan 16, 2025 at 3:10 PM Rajnesh Kanwal <rkanwal@...osinc.com> wrote:
> > > >
> > > > RISCV CTR extension support a maximum depth of 256 last branch records.
> > > > The 127 entries limit results in corrupting CTR entries for RISC-V if
> > > > configured to be 256 entries. This will not impact any other architectures
> > > > as it is just increasing maximum limit of possible entries.
> > >
> > > I wonder if rather than a constant this code should just use the auto
> > > resizing hashmap code?
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/hashmap.h
> > >
> > > I assume the value of 127 comes from perf_event.h's PERF_MAX_STACK_DEPTH:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/perf_event.h#n1252
> > >
> > > Perhaps these constants shouldn't exist. The perf-record man page mentions:
> > > sysctl.kernel.perf_event_max_stack
> > > which I believe gets a value from
> > > /proc/sys/kernel/perf_event_max_stack, so maybe these should be
> > > runtime determined constants rather than compile time.
>
> While working on this, I came across the following two patches. It
> looks like what
> you have suggested, it was tried before but later on Arnaldo reverted the change
> from report and script cmds due to reasons mentioned in the second patch.
>
> https://lore.kernel.org/lkml/1461767472-8827-31-git-send-email-acme@kernel.org/
> https://lore.kernel.org/lkml/1463696493-27528-8-git-send-email-acme@kernel.org/

Thanks Rajnash, agreed on what you found. I wonder to resolve the
issue we could add a header feature:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/header.h?h=perf-tools-next#n21
for max stack depth.

Thanks,
Ian

> Regards
> Rajnesh
>
>
> > >
> >
> > Thanks Ian for your feedback. I am not sure if it's feasible to use auto
> > resizing hashmap here. On each sample of 256 entries we will be doing
> > 6 callocs and transferring a whole lot of entries in hashmap_grow. We
> > can't reuse old hashmap as well. On each sample we bear the same cost
> >
> > But I do agree this should be more dynamic but the maximum number
> > of entries remove_loops can process is limited by the type of chash array
> > here. I can change it and related logic to use uint16_t or higher but we
> > will still have a cap on the number of entries.
> >
> > PERF_MAX_BRANCH_DEPTH seems to be denoting what remove_loops
> > can process. This is being used by thread__resolve_callchain_sample to
> > check if the sample is processable before calling remove_loops. I think
> > this can't be changed to use perf_event_max_stack. But I can rename
> > this macro to avoid confusion.
> >
> > I didn't notice PERF_MAX_STACK_DEPTH. This seems to be defined in
> > multiple places and touches bpf as well. I agree that we should avoid
> > using this macro and use runtime determined value instead. Tbh I don't
> > have super in-depth perf understanding. I will give this a try and send a
> > patch in the next update. It would be helpful if you can review it.
> >
> > Thanks
> > -Rajnesh
> >
> > > Thanks,
> > > Ian
> > >
> > > > Signed-off-by: Rajnesh Kanwal <rkanwal@...osinc.com>
> > > > ---
> > > >  tools/perf/util/machine.c | 21 ++++++++++++++-------
> > > >  1 file changed, 14 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > > index 27d5345d2b30..f2eb3c20274e 100644
> > > > --- a/tools/perf/util/machine.c
> > > > +++ b/tools/perf/util/machine.c
> > > > @@ -2174,25 +2174,32 @@ static void save_iterations(struct iterations *iter,
> > > >                 iter->cycles += be[i].flags.cycles;
> > > >  }
> > > >
> > > > -#define CHASHSZ 127
> > > > -#define CHASHBITS 7
> > > > -#define NO_ENTRY 0xff
> > > > +#define CHASHBITS 8
> > > > +#define NO_ENTRY 0xffU
> > > >
> > > > -#define PERF_MAX_BRANCH_DEPTH 127
> > > > +#define PERF_MAX_BRANCH_DEPTH 256
> > > >
> > > >  /* Remove loops. */
> > > > +/* Note: Last entry (i==ff) will never be checked against NO_ENTRY
> > > > + * so it's safe to have an unsigned char array to process 256 entries
> > > > + * without causing clash between last entry and NO_ENTRY value.
> > > > + */
> > > >  static int remove_loops(struct branch_entry *l, int nr,
> > > >                         struct iterations *iter)
> > > >  {
> > > >         int i, j, off;
> > > > -       unsigned char chash[CHASHSZ];
> > > > +       unsigned char chash[PERF_MAX_BRANCH_DEPTH];
> > > >
> > > >         memset(chash, NO_ENTRY, sizeof(chash));
> > > >
> > > > -       BUG_ON(PERF_MAX_BRANCH_DEPTH > 255);
> > > > +       BUG_ON(PERF_MAX_BRANCH_DEPTH > 256);
> > > >
> > > >         for (i = 0; i < nr; i++) {
> > > > -               int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
> > > > +               /* Remainder division by PERF_MAX_BRANCH_DEPTH is not
> > > > +                * needed as hash_64 will anyway limit the hash
> > > > +                * to CHASHBITS
> > > > +                */
> > > > +               int h = hash_64(l[i].from, CHASHBITS);
> > > >
> > > >                 /* no collision handling for now */
> > > >                 if (chash[h] == NO_ENTRY) {
> > > > --
> > > > 2.34.1
> > > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ