lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAECbVCui6ZgHBXBr3LdVGd16ERe0GgAnA1zy_zOQZVTU3bPoWw@mail.gmail.com>
Date: Wed, 21 May 2025 11:47:18 +0100
From: Rajnesh Kanwal <rkanwal@...osinc.com>
To: Ian Rogers <irogers@...gle.com>, ak@...ux.intel.com
Cc: "Liang, Kan" <kan.liang@...ux.intel.com>, linux-kernel@...r.kernel.org, 
	linux-riscv@...ts.infradead.org, linux-perf-users@...r.kernel.org, 
	adrian.hunter@...el.com, alexander.shishkin@...ux.intel.com, 
	ajones@...tanamicro.com, anup@...infault.org, acme@...nel.org, 
	atishp@...osinc.com, beeman@...osinc.com, brauner@...nel.org, 
	conor@...nel.org, heiko@...ech.de, mingo@...hat.com, james.clark@....com, 
	renyu.zj@...ux.alibaba.com, jolsa@...nel.org, jisheng.teoh@...rfivetech.com, 
	palmer@...belt.com, will@...nel.org, kaiwenxue1@...il.com, 
	vincent.chen@...ive.com
Subject: Re: [PATCH v2 1/7] perf: Increase the maximum number of samples to 256.

On Thu, Apr 17, 2025 at 1:51 PM Rajnesh Kanwal <rkanwal@...osinc.com> wrote:
>
> + Adding Andi Kleen.
>
> On Thu, Feb 20, 2025 at 6:51 PM Ian Rogers <irogers@...gle.com> wrote:
> >
> > On Thu, Jan 16, 2025 at 3:10 PM Rajnesh Kanwal <rkanwal@...osinc.com> wrote:
> > >
> > > RISCV CTR extension support a maximum depth of 256 last branch records.
> > > The 127 entries limit results in corrupting CTR entries for RISC-V if
> > > configured to be 256 entries. This will not impact any other architectures
> > > as it is just increasing maximum limit of possible entries.
> >
> > I wonder if rather than a constant this code should just use the auto
> > resizing hashmap code?
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/hashmap.h
> >
> > I assume the value of 127 comes from perf_event.h's PERF_MAX_STACK_DEPTH:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/perf_event.h#n1252
> >
> > Perhaps these constants shouldn't exist. The perf-record man page mentions:
> > sysctl.kernel.perf_event_max_stack
> > which I believe gets a value from
> > /proc/sys/kernel/perf_event_max_stack, so maybe these should be
> > runtime determined constants rather than compile time.

While working on this, I came across the following two patches. It
looks like what
you have suggested, it was tried before but later on Arnaldo reverted the change
from report and script cmds due to reasons mentioned in the second patch.

https://lore.kernel.org/lkml/1461767472-8827-31-git-send-email-acme@kernel.org/
https://lore.kernel.org/lkml/1463696493-27528-8-git-send-email-acme@kernel.org/

Regards
Rajnesh


> >
>
> Thanks Ian for your feedback. I am not sure if it's feasible to use auto
> resizing hashmap here. On each sample of 256 entries we will be doing
> 6 callocs and transferring a whole lot of entries in hashmap_grow. We
> can't reuse old hashmap as well. On each sample we bear the same cost
>
> But I do agree this should be more dynamic but the maximum number
> of entries remove_loops can process is limited by the type of chash array
> here. I can change it and related logic to use uint16_t or higher but we
> will still have a cap on the number of entries.
>
> PERF_MAX_BRANCH_DEPTH seems to be denoting what remove_loops
> can process. This is being used by thread__resolve_callchain_sample to
> check if the sample is processable before calling remove_loops. I think
> this can't be changed to use perf_event_max_stack. But I can rename
> this macro to avoid confusion.
>
> I didn't notice PERF_MAX_STACK_DEPTH. This seems to be defined in
> multiple places and touches bpf as well. I agree that we should avoid
> using this macro and use runtime determined value instead. Tbh I don't
> have super in-depth perf understanding. I will give this a try and send a
> patch in the next update. It would be helpful if you can review it.
>
> Thanks
> -Rajnesh
>
> > Thanks,
> > Ian
> >
> > > Signed-off-by: Rajnesh Kanwal <rkanwal@...osinc.com>
> > > ---
> > >  tools/perf/util/machine.c | 21 ++++++++++++++-------
> > >  1 file changed, 14 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > index 27d5345d2b30..f2eb3c20274e 100644
> > > --- a/tools/perf/util/machine.c
> > > +++ b/tools/perf/util/machine.c
> > > @@ -2174,25 +2174,32 @@ static void save_iterations(struct iterations *iter,
> > >                 iter->cycles += be[i].flags.cycles;
> > >  }
> > >
> > > -#define CHASHSZ 127
> > > -#define CHASHBITS 7
> > > -#define NO_ENTRY 0xff
> > > +#define CHASHBITS 8
> > > +#define NO_ENTRY 0xffU
> > >
> > > -#define PERF_MAX_BRANCH_DEPTH 127
> > > +#define PERF_MAX_BRANCH_DEPTH 256
> > >
> > >  /* Remove loops. */
> > > +/* Note: Last entry (i==ff) will never be checked against NO_ENTRY
> > > + * so it's safe to have an unsigned char array to process 256 entries
> > > + * without causing clash between last entry and NO_ENTRY value.
> > > + */
> > >  static int remove_loops(struct branch_entry *l, int nr,
> > >                         struct iterations *iter)
> > >  {
> > >         int i, j, off;
> > > -       unsigned char chash[CHASHSZ];
> > > +       unsigned char chash[PERF_MAX_BRANCH_DEPTH];
> > >
> > >         memset(chash, NO_ENTRY, sizeof(chash));
> > >
> > > -       BUG_ON(PERF_MAX_BRANCH_DEPTH > 255);
> > > +       BUG_ON(PERF_MAX_BRANCH_DEPTH > 256);
> > >
> > >         for (i = 0; i < nr; i++) {
> > > -               int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
> > > +               /* Remainder division by PERF_MAX_BRANCH_DEPTH is not
> > > +                * needed as hash_64 will anyway limit the hash
> > > +                * to CHASHBITS
> > > +                */
> > > +               int h = hash_64(l[i].from, CHASHBITS);
> > >
> > >                 /* no collision handling for now */
> > >                 if (chash[h] == NO_ENTRY) {
> > > --
> > > 2.34.1
> > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ