[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bzaq86fPVGWtXqvxLtbsk06coGBebnAO5YiuvuUF2v7++w@mail.gmail.com>
Date: Tue, 6 Aug 2024 10:26:25 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Jiri Olsa <olsajiri@...il.com>
Cc: Andrii Nakryiko <andrii@...nel.org>, linux-trace-kernel@...r.kernel.org,
rostedt@...dmis.org, mhiramat@...nel.org, peterz@...radead.org,
oleg@...hat.com, bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] uprobes: get rid of bogus trace_uprobe hit counter
On Tue, Aug 6, 2024 at 12:37 AM Jiri Olsa <olsajiri@...il.com> wrote:
>
> On Mon, Aug 05, 2024 at 01:28:03PM -0700, Andrii Nakryiko wrote:
> > trace_uprobe->nhit counter is not incremented atomically, so its value
> > is bogus in practice. On the other hand, it's actually a pretty big
> > uprobe scalability problem due to heavy cache line bouncing between CPUs
> > triggering the same uprobe.
>
> so you're seeing that in the benchmark, right? I'm curious how bad
> the numbers are
>
Yes. So, once we get rid of all the uprobe/uretprobe/mm locks (ongoing
work), this one was the last limiter to linear scalability.
With this counter, I was topping out at about 12 mln/s uprobe
triggering (I think it was 32 CPUs, but I don't remember exactly now).
About 30% of CPU cycles were spent in this increment.
But those 30% don't paint the full picture. Once the counter is
removed, the same uprobe throughput jumps to 62 mln/s or so. So we
definitely have to do something about it.
> >
> > Drop it and emit obviously unrealistic value in its stead in
> > uporbe_profiler seq file.
> >
> > The alternative would be allocating per-CPU counter, but I'm not sure
> > it's justified.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@...nel.org>
> > ---
> > kernel/trace/trace_uprobe.c | 4 +---
> > 1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > index 52e76a73fa7c..5d38207db479 100644
> > --- a/kernel/trace/trace_uprobe.c
> > +++ b/kernel/trace/trace_uprobe.c
> > @@ -62,7 +62,6 @@ struct trace_uprobe {
> > struct uprobe *uprobe;
> > unsigned long offset;
> > unsigned long ref_ctr_offset;
> > - unsigned long nhit;
> > struct trace_probe tp;
> > };
> >
> > @@ -821,7 +820,7 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
> >
> > tu = to_trace_uprobe(ev);
> > seq_printf(m, " %s %-44s %15lu\n", tu->filename,
> > - trace_probe_name(&tu->tp), tu->nhit);
> > + trace_probe_name(&tu->tp), ULONG_MAX);
>
> seems harsh.. would it be that bad to create per cpu counter for that?
Well, consider this patch a conversation starter. There are two
reasons why I'm removing the counter instead of doing per-CPU one:
- it's less work to send out a patch pointing out the problem (but
the solution might change)
- this counter was never correct in the presence of multiple
threads, so I'm not sure how useful it is.
Yes, I think we can do per-CPU counters, but do we want to pay the
memory price? That's what I want to get from Masami, Steven, or Peter
(whoever cares enough).
>
> jirka
>
> > return 0;
> > }
> >
> > @@ -1507,7 +1506,6 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
> > int ret = 0;
> >
> > tu = container_of(con, struct trace_uprobe, consumer);
> > - tu->nhit++;
> >
> > udd.tu = tu;
> > udd.bp_addr = instruction_pointer(regs);
> > --
> > 2.43.5
> >
Powered by blists - more mailing lists