lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240820150534.GD12400@redhat.com>
Date: Tue, 20 Aug 2024 17:05:34 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Andrii Nakryiko <andrii@...nel.org>, linux-trace-kernel@...r.kernel.org,
	peterz@...radead.org, rostedt@...dmis.org, mhiramat@...nel.org,
	bpf@...r.kernel.org, linux-kernel@...r.kernel.org, jolsa@...nel.org,
	paulmck@...nel.org, willy@...radead.org, surenb@...gle.com,
	akpm@...ux-foundation.org, linux-mm@...ck.org
Subject: Re: [PATCH RFC v3 09/13] uprobes: SRCU-protect uretprobe lifetime
 (with timeout)

On 08/19, Andrii Nakryiko wrote:
>
> On Mon, Aug 19, 2024 at 6:41 AM Oleg Nesterov <oleg@...hat.com> wrote:
> >
> > On 08/12, Andrii Nakryiko wrote:
> > >
> > > Avoid taking refcount on uprobe in prepare_uretprobe(), instead take
> > > uretprobe-specific SRCU lock and keep it active as kernel transfers
> > > control back to user space.
> > ...
> > >  include/linux/uprobes.h |  49 ++++++-
> > >  kernel/events/uprobes.c | 294 ++++++++++++++++++++++++++++++++++------
> > >  2 files changed, 301 insertions(+), 42 deletions(-)
> >
> > Oh. To be honest I don't like this patch.
> >
> > I would like to know what other reviewers think, but to me it adds too many
> > complications that I can't even fully understand...
>
> Which parts? The atomic xchg() and cmpxchg() parts? What exactly do
> you feel like you don't fully understand?

Heh, everything looks too complex for me ;)

Say, hprobe_expire(). It is also called by ri_timer() in softirq context,
right? And it does

	/* We lost the race, undo our refcount bump. It can drop to zero. */
	put_uprobe(uprobe);

How so? If the refcount goes to zero, put_uprobe() does mutex_lock(),
but it must not sleep in softirq context.


Or, prepare_uretprobe() which does

	rcu_assign_pointer(utask->return_instances, ri);

	if (!timer_pending(&utask->ri_timer))
		mod_timer(&utask->ri_timer, ...);

Suppose that the timer was pending and it was fired right before
rcu_assign_pointer(). What guarantees that prepare_uretprobe() will see
timer_pending() == false?

rcu_assign_pointer()->smp_store_release() is a one-way barrier. This
timer_pending() check may appear to happen before rcu_assign_pointer()
completes.

In this (yes, theoretical) case ri_timer() can miss the new return_instance,
while prepare_uretprobe() can miss the necessary mod_timer(). I think this
needs another mb() in between.


And I can't convince myself hprobe_expire() is correct... OK, I don't
fully understand the logic, but why data_race(READ_ONCE(hprobe->leased)) ?
READ_ONCE() should be enough in this case?


> > As I have already mentioned in the previous discussions, we can simply kill
> > utask->active_uprobe. And utask->auprobe.
>
> I don't have anything against that, in principle, but let's benchmark
> and test that thoroughly. I'm a bit uneasy about the possibility that
> some arch-specific code will do container_of() on this arch_uprobe in
> order to get to uprobe,

Well, struct uprobe is not "exported", the arch-specific code can't do this.

Oleg.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ