linux-kernel - Re: [PATCH v8 00/15] futex: Add support task local hash maps.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250205165200.v9iduJ1S@linutronix.de>
Date: Wed, 5 Feb 2025 17:52:00 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org,
	André Almeida <andrealmeid@...lia.com>,
	Darren Hart <dvhart@...radead.org>,
	Davidlohr Bueso <dave@...olabs.net>, Ingo Molnar <mingo@...hat.com>,
	Juri Lelli <juri.lelli@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Waiman Long <longman@...hat.com>
Subject: Re: [PATCH v8 00/15] futex: Add support task local hash maps.

On 2025-02-05 13:52:50 [+0100], Peter Zijlstra wrote:
> On Wed, Feb 05, 2025 at 01:20:26PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2025-02-04 16:14:05 [+0100], Peter Zijlstra wrote:
> > 
> > This does not compile. Let me fix this up, a few comments…
> 
> Moo, clangd didn't complain :/ But yeah, I didn't actually compile this,
> only had neovim running clangd.

don't worry. I assumed that it is a sketch :) Let me add that one to the
list of things to try…

> > > diff --git a/io_uring/futex.c b/io_uring/futex.c
> > > index 3159a2b7eeca..18cd5ccde36d 100644
> > > --- a/io_uring/futex.c
> > > +++ b/io_uring/futex.c
> > > @@ -332,13 +331,13 @@ int io_futex_wait(struct io_kiocb *req, unsigned int issue_flags)
> > >  	ifd->q.wake = io_futex_wake_fn;
> > >  	ifd->req = req;
> > >  
> > > +	// XXX task->state is messed up
> > >  	ret = futex_wait_setup(iof->uaddr, iof->futex_val, iof->futex_flags,
> > > -			       &ifd->q, &hb);
> > > +			       &ifd->q, NULL);
> > >  	if (!ret) {
> > >  		hlist_add_head(&req->hash_node, &ctx->futex_list);
> > >  		io_ring_submit_unlock(ctx, issue_flags);
> > >  
> > > -		futex_queue(&ifd->q, hb);
> > >  		return IOU_ISSUE_SKIP_COMPLETE;
> > 
> > This looks interesting. This is called from
> > req->io_task_work.func = io_req_task_submit
> > | io_req_task_submit()
> > | -> io_issue_sqe()
> > |    -> def->issue() <- io_futex_wait
> > 
> > and
> > io_fallback_req_func() iterates over a list and invokes
> > req->io_task_work.func. This seems to be also invoked from
> > io_sq_thread() (via io_sq_tw() -> io_handle_tw_list()).
> > 
> > If this (wait and wake) is only used within kernel threads then it is
> > fine. If the waker and/ or waiter are in user context then we are in
> > trouble because one will use the private hash of the process and the
> > other won't because it is a kernel thread. So the messer-up task->state
> > is the least of problems.
> 
> Right, so the io-uring stuff is tricky, I think this more or less does
> what it used to though. I 'simply' moved the futex_queue() into
> futex_wait_setup().

This is correct. The changed task's state may or may not be relevant.

> IIRC the io-uring threads share the process-mm but will never hit
> userspace.

One function had a kworker prototype. There are also the PF_IO_WORKER. I
need clarify with io_uring ppl how this works…

> > > @@ -451,20 +442,22 @@ int futex_wait_multiple_setup(struct futex_vector *vs, int count, int *woken)
> > >  		struct futex_q *q = &vs[i].q;
> > >  		u32 val = vs[i].w.val;
> > >  
> > > -		hb = futex_q_lock(q);
> > > -		ret = futex_get_value_locked(&uval, uaddr);
> > > +		if (1) {
> > > +			CLASS(hb_q_lock, hb)(q);
> > > +			ret = futex_get_value_locked(&uval, uaddr);
> > 
> > This confused me at the beginning because I expected hb_q_lock having
> > the lock part in the constructor and also the matching unlock in the
> > deconstructor. But no, this is not the case.
> 
> Agreed, that *is* rather ugly. The sane way to fix that might be to
> untangle futex_q_lock() from futex_hash(). And instead do:
> 
> 			CLASS(hb, hb)(&q->key);
> 			futex_q_lock(q, hb);
> 
> Or somesuch. That might be a nice cleanup either way.

If you know, you know.
We could do the lock manually in the first iteration and then try to
hide it later on. The error path always drops it early. We could unlock
manually there and tell the compiler that it is done.

> > So the beauty of it is that you enforce a ref drop on hb once it gets
> > out of scope. So you can't use it by chance once the ref is dropped.
> 
> Right.
> 
> > But this does not help in futex_lock_pi() where you have the drop the
> > reference before __rt_mutex_start_proxy_lock() (or at least before
> > rt_mutex_wait_proxy_lock()) but still have it you go for the no_block
> > shortcut. At which point even the lock is still owned.
> > 
> > While it makes the other cases nicer, the futex_lock_pi() function was
> > the only one where I was thinking about setting hb to NULL to avoid
> > accidental usage later on.
> 
> OK, so yeah, I got completely lost in futex_lock_pi(), and I couldn't
> figure out what you did there. Let me try and untangle that again.

The hash bucket reference should be dropped if the task is going to
sleep. The try-lock path there has a fast-forward to the end.

Based on memory the unlock PI (and requeue PI) is also a bit odd because
I need to grab an extra reference in the corner case. That is, to avoid
a resize because I need a stable ::lock_ptr and task is no longer
enqueued. So the ::lock_ptr will point to a stale hash bucket.

Sebastian