[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzZSyuFexZfwZs1bA9S=O0FHejw_tE6PXm5h8ftMsuSROw@mail.gmail.com>
Date: Wed, 7 Aug 2024 08:24:04 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Andrii Nakryiko <andrii@...nel.org>, linux-trace-kernel@...r.kernel.org,
peterz@...radead.org, rostedt@...dmis.org, mhiramat@...nel.org,
bpf@...r.kernel.org, linux-kernel@...r.kernel.org, jolsa@...nel.org,
paulmck@...nel.org
Subject: Re: [PATCH 0/8] uprobes: RCU-protected hot path optimizations
On Wed, Aug 7, 2024 at 6:30 AM Oleg Nesterov <oleg@...hat.com> wrote:
>
> On 07/31, Andrii Nakryiko wrote:
> >
> > Andrii Nakryiko (6):
> > uprobes: revamp uprobe refcounting and lifetime management
> > uprobes: protected uprobe lifetime with SRCU
> > uprobes: get rid of enum uprobe_filter_ctx in uprobe filter callbacks
> > uprobes: travers uprobe's consumer list locklessly under SRCU
> > protection
> > uprobes: perform lockless SRCU-protected uprobes_tree lookup
> > uprobes: switch to RCU Tasks Trace flavor for better performance
> >
> > Peter Zijlstra (2):
> > rbtree: provide rb_find_rcu() / rb_find_add_rcu()
> > perf/uprobe: split uprobe_unregister()
>
> I see nothing wrong in 1-7. LGTM.
>
So, I don't know how it slipped the first time, because I tested
overnight with uprobe-stress, perhaps I adjusted some parameters (more
threads or different ratio of threads, not sure by now), but it turns
out that lockless RB-tree traversal actually crashes after a few
minutes of running uprobe-stress. I'll post details separately later
today, but I suspect that rb_find_rcu() and rb_find_add_rcu() is not
enough to make it safe.
Hopefully someone can help me figure this out.
> But since you are going to send the new version, I'd like to apply V2
> and then try to re-check the resulting code.
Yes, I was waiting for more of Peter's comments, but I guess I'll just
send a v2 today. I'll probably include the SRCU+timeout logic for
return_instances, and maybe lockless VMA parts as well. Those won't be
yet for landing, but it's probably useful to see everything
end-to-end.
Given the hiccup with lockless uprobes_tree lookup, we should land
that change, but the first 4 patches hopefully are in good enough
shape to apply and reduce the amount of patches that need to be
juggled around.
>
> As for 8/8 - I leave it to you and Peter. I'd prefer SRCU though ;)
Honestly curious, why the preference?
>
> Oleg.
>
BTW, while you are here :) What can you say about
current->sighand->siglock use in handle_singlestep()? Is there any way
to avoid that (or at least not have to take it every single time a
single-stepped uprobe is triggered?). This turned out to be a huge
issue for single-stepped uprobe when scaling to multiple CPUs even
with all the other things (all the SRCU and lockless VMA lookup) taken
care of.
Powered by blists - more mailing lists