linux-kernel - Re: [PATCH v3 1/6] perf maps: Switch from rbtree to lazily sorted array for addresses

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fUR7X-vyE1XgB-=B0_D59QACmKSD_gCq0Sn=dvx2GYQ4g@mail.gmail.com>
Date: Mon, 12 Feb 2024 12:37:15 -0800
From: Ian Rogers <irogers@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Song Liu <song@...nel.org>, 
	Colin Ian King <colin.i.king@...il.com>, Liam Howlett <liam.howlett@...cle.com>, 
	K Prateek Nayak <kprateek.nayak@....com>, Artem Savkov <asavkov@...hat.com>, 
	Changbin Du <changbin.du@...wei.com>, Masami Hiramatsu <mhiramat@...nel.org>, 
	Athira Rajeev <atrajeev@...ux.vnet.ibm.com>, Alexey Dobriyan <adobriyan@...il.com>, 
	James Clark <james.clark@....com>, Vincent Whitchurch <vincent.whitchurch@...s.com>, 
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org, 
	bpf@...r.kernel.org, leo.yan@...ux.dev
Subject: Re: [PATCH v3 1/6] perf maps: Switch from rbtree to lazily sorted
 array for addresses

On Mon, Feb 12, 2024 at 12:26 PM Namhyung Kim <namhyung@...nel.org> wrote:
>
> On Mon, Feb 12, 2024 at 12:19 PM Ian Rogers <irogers@...gle.com> wrote:
> >
> > On Mon, Feb 12, 2024 at 12:15 PM Namhyung Kim <namhyung@...nel.org> wrote:
> > >
> > > On Fri, Feb 9, 2024 at 7:18 PM Ian Rogers <irogers@...gle.com> wrote:
> > > >
> > > > Maps is a collection of maps primarily sorted by the starting address
> > > > of the map. Prior to this change the maps were held in an rbtree
> > > > requiring 4 pointers per node. Prior to reference count checking, the
> > > > rbnode was embedded in the map so 3 pointers per node were
> > > > necessary. This change switches the rbtree to an array lazily sorted
> > > > by address, much as the array sorting nodes by name. 1 pointer is
> > > > needed per node, but to avoid excessive resizing the backing array may
> > > > be twice the number of used elements. Meaning the memory overhead is
> > > > roughly half that of the rbtree. For a perf record with
> > > > "--no-bpf-event -g -a" of true, the memory overhead of perf inject is
> > > > reduce fom 3.3MB to 3MB, so 10% or 300KB is saved.
> > > >
> > > > Map inserts always happen at the end of the array. The code tracks
> > > > whether the insertion violates the sorting property. O(log n) rb-tree
> > > > complexity is switched to O(1).
> > > >
> > > > Remove slides the array, so O(log n) rb-tree complexity is degraded to
> > > > O(n).
> > > >
> > > > A find may need to sort the array using qsort which is O(n*log n), but
> > > > in general the maps should be sorted and so average performance should
> > > > be O(log n) as with the rbtree.
> > > >
> > > > An rbtree node consumes a cache line, but with the array 4 nodes fit
> > > > on a cache line. Iteration is simplified to scanning an array rather
> > > > than pointer chasing.
> > > >
> > > > Overall it is expected the performance after the change should be
> > > > comparable to before, but with half of the memory consumed.
> > > >
> > > > To avoid a list and repeated logic around splitting maps,
> > > > maps__merge_in is rewritten in terms of
> > > > maps__fixup_overlap_and_insert. maps_merge_in splits the given mapping
> > > > inserting remaining gaps. maps__fixup_overlap_and_insert splits the
> > > > existing mappings, then adds the incoming mapping. By adding the new
> > > > mapping first, then re-inserting the existing mappings the splitting
> > > > behavior matches.
> > > >
> > > > Signed-off-by: Ian Rogers <irogers@...gle.com>
> > > > Acked-by: Namhyung Kim <namhyung@...nel.org>
> > > > ---
> > > [SNIP]
> > > >  int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data)
> > > >  {
> > > > -       struct map_rb_node *pos;
> > > > +       bool done = false;
> > > >         int ret = 0;
> > > >
> > > > -       down_read(maps__lock(maps));
> > > > -       maps__for_each_entry(maps, pos) {
> > > > -               ret = cb(pos->map, data);
> > > > -               if (ret)
> > > > -                       break;
> > > > +       /* See locking/sorting note. */
> > > > +       while (!done) {
> > > > +               down_read(maps__lock(maps));
> > > > +               if (maps__maps_by_address_sorted(maps)) {
> > > > +                       /*
> > > > +                        * maps__for_each_map callbacks may buggily/unsafely
> > > > +                        * insert into maps_by_address. Deliberately reload
> > > > +                        * maps__nr_maps and maps_by_address on each iteration
> > > > +                        * to avoid using memory freed by maps__insert growing
> > > > +                        * the array - this may cause maps to be skipped or
> > > > +                        * repeated.
> > > > +                        */
> > > > +                       for (unsigned int i = 0; i < maps__nr_maps(maps); i++) {
> > > > +                               struct map **maps_by_address = maps__maps_by_address(maps);
> > >
> > > Any chance they can move out of the loop?  I guess not as they are
> > > not marked to const/pure functions..
> >
> > It's not because the cb(...) call below will potentially modify
> > maps_by_address by inserting maps and reallocating the array. Having
> > it outside the loop was what caused the original bug.
>
> Oh, I meant if compiler can move them automatically.

The const (on the accessor) isn't sufficient for that, they'd perhaps
need to be restrict. maps here is neither const or restrict which
means the callback can modify it (even though it isn't an argument)
and the function here needs to reload its value.

Thanks,
Ian

> Thanks,
> Namhyung
>
> > >
> > >
> > > > +                               struct map *map = maps_by_address[i];
> > > > +
> > > > +                               ret = cb(map, data);
> > > > +                               if (ret)
> > > > +                                       break;
> > > > +                       }
> > > > +                       done = true;
> > > > +               }
> > > > +               up_read(maps__lock(maps));
> > > > +               if (!done)
> > > > +                       maps__sort_by_address(maps);
> > > >         }
> > > > -       up_read(maps__lock(maps));
> > > >         return ret;
> > > >  }