lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzYbkXHw-wZ=+0pCnGFVD-zuuhC-b=5Uz+w9F_RuPEFMOw@mail.gmail.com>
Date: Wed, 3 Jul 2024 13:55:21 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "Paul E . McKenney" <paulmck@...nel.org>, Andrii Nakryiko <andrii@...nel.org>, 
	linux-trace-kernel@...r.kernel.org, rostedt@...dmis.org, mhiramat@...nel.org, 
	oleg@...hat.com, mingo@...hat.com, bpf@...r.kernel.org, jolsa@...nel.org, 
	clm@...a.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 00/12] uprobes: add batched register/unregister APIs
 and per-CPU RW semaphore

On Wed, Jul 3, 2024 at 1:07 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Tue, Jul 02, 2024 at 09:47:41PM -0700, Andrii Nakryiko wrote:
>
> > > As you noted, that percpu-rwsem write side is quite insane. And you're
> > > creating this batch complexity to mitigate that.
> >
> >
> > Note that batch API is needed regardless of percpu RW semaphore or
> > not. As I mentioned, once uprobes_treelock is mitigated one way or the
> > other, the next one is uprobe->register_rwsem. For scalability, we
> > need to get rid of it and preferably not add any locking at all. So
> > tentatively I'd like to have lockless RCU-protected iteration over
> > uprobe->consumers list and call consumer->handler(). This means that
> > on uprobes_unregister we'd need synchronize_rcu (for whatever RCU
> > flavor we end up using), to ensure that we don't free uprobe_consumer
> > memory from under handle_swbp() while it is actually triggering
> > consumers.
> >
> > So, without batched unregistration we'll be back to the same problem
> > I'm solving here: doing synchronize_rcu() for each attached uprobe one
> > by one is prohibitively slow. We went through this exercise with
> > ftrace/kprobes already and fixed it with batched APIs. Doing that for
> > uprobes seems unavoidable as well.
>
> I'm not immediately seeing how you need that terrible refcount stuff for

Which part is terrible, please be more specific. I can switch to
refcount_inc_not_zero() and leave performance improvement on the
table, but why is that a good idea?

> the batching though. If all you need is group a few unregisters together
> in order to share a sync_rcu() that seems way overkill.
>
> You seem to have muddled the order of things, which makes the actual
> reason for doing things utterly unclear.

See -EGAIN handling in uprobe_register() code in current upstream
kernel. We manage to allocate and insert (or update existing) uprobe
in uprobes_tree. And then when we try to register we can post factum
detect that uprobe was removed from RB tree from under us. And we have
to go on a retry, allocating/inserting/updating it again.

This is quite problematic for batched API, in which I split the whole
attachment into few independent phase:

  - preallocate uprobe instances (for all consumers/uprobes)
  - insert them or reuse pre-existing ones (again, for all consumers
in one batch, protected by single writer lock on uprobes_treelock);
  - then register/apply for each VMA (you get it, for all consumers in one go).

Having this retry for some of uprobes because of this race is hugely
problematic, so I wanted to make it cleaner and simpler: once you
manage to insert/reuse uprobe, it's not going away from under me.
Which is why the change to refcounting schema.

And I think it's a major improvement. We can argue about
refcount_inc_not_zero vs this custom refcounting schema, but I think
the change should be made.

Now, imagine I also did all the seqcount and RCU stuff across entire
uprobe functionality. Wouldn't that be mind bending a little bit to
wrap your head around this?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ