lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62825712-36bc-483c-9bca-db3d9233b0d2@efficios.com>
Date: Thu, 23 May 2024 16:20:16 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: André Almeida <andrealmeid@...lia.com>,
 Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
 "Paul E . McKenney" <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
 "H . Peter Anvin" <hpa@...or.com>, Paul Turner <pjt@...gle.com>,
 linux-api@...r.kernel.org, Christian Brauner <brauner@...nel.org>,
 Florian Weimer <fw@...eb.enyo.de>, David.Laight@...LAB.COM,
 carlos@...hat.com, Peter Oskolkov <posk@...k.io>,
 Alexander Mikhalitsyn <alexander@...alicyn.com>,
 Chris Kennelly <ckennelly@...gle.com>, Ingo Molnar <mingo@...hat.com>,
 Darren Hart <dvhart@...radead.org>, Davidlohr Bueso <dave@...olabs.net>,
 libc-alpha@...rceware.org, Steven Rostedt <rostedt@...dmis.org>,
 Jonathan Corbet <corbet@....net>, Noah Goldstein <goldstein.w.n@...il.com>,
 Daniel Colascione <dancol@...gle.com>, longman@...hat.com,
 kernel-dev@...lia.com
Subject: Re: [PATCH v2 1/1] futex: Add FUTEX_SPIN operation

On 2024-05-23 16:07, André Almeida wrote:
> Add a new mode for futex wait, the futex spin.
> 
> Given the FUTEX2_SPIN flag, parse the futex value as the TID of the lock
> owner. Then, before going to the normal wait path, spins while the lock
> owner is running in a different CPU, to avoid the whole context switch
> operation and to quickly return to userspace. If the lock owner is not
> running, just sleep as the normal futex wait path.
> 
> The user value is masked with FUTEX_TID_MASK, to allow some bits for
> future use.
> 
> The check for the owner to be running or not is important to avoid
> spinning for something that won't be released quickly. Userspace is
> responsible on providing the proper TID, the kernel does a basic check.
> 
> Signed-off-by: André Almeida <andrealmeid@...lia.com>
> ---

[...]

> +
> +static int futex_spin(struct futex_hash_bucket *hb, struct futex_q *q,
> +		       struct hrtimer_sleeper *timeout, u32 uval)
> +{
> +	struct task_struct *p;
> +	pid_t pid = uval & FUTEX_TID_MASK;
> +
> +	p = find_get_task_by_vpid(pid);
> +
> +	/* no task found, maybe it already exited */
> +	if (!p) {
> +		futex_q_unlock(hb);
> +		return -EAGAIN;
> +	}
> +
> +	/* can't spin in a kernel task */
> +	if (unlikely(p->flags & PF_KTHREAD)) {
> +		put_task_struct(p);
> +		futex_q_unlock(hb);
> +		return -EPERM;
> +	}
> +
> +	futex_queue(q, hb);
> +
> +	if (timeout)
> +		hrtimer_sleeper_start_expires(timeout, HRTIMER_MODE_ABS);
> +
> +	while (1) {

Infinite loops in other kernel/futex/ files appear to use "for (;;) {"
instead.

> +		if (likely(!plist_node_empty(&q->list))) {
> +			if (timeout && !timeout->task)
> +				goto exit;
> +
> +			if (task_on_cpu(p)) {
> +				/* spin */

You may want to add a "cpu_relax();" here to lessen the
power consumption of this busy-loop.

> +				continue;
> +			} else {
> +				/* task is not running, sleep */
> +				break;
> +			}
> +		} else {
> +			goto exit;
> +		}
> +	}
> +
> +	/* spinning didn't work, go to the normal path */
> +	set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);

I wonder if flipping the order between:

         set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
         futex_queue(q, hb);

as done in futex_wait_queue() and what is done here matters ?
Does it introduce a race where a wakeup could be missed ?

If it's an issue, then setting the current state could be done
before invoking futex_queue(), and whenever the spin exits,
set it back to TASK_RUNNING.


> +
> +	if (likely(!plist_node_empty(&q->list))) {
> +		if (!timeout || timeout->task)
> +			schedule();
> +	}
> +
> +	__set_current_state(TASK_RUNNING);
> +
> +exit:
> +	put_task_struct(p);
> +	return 0;
> +}
> +
>   /**
>    * futex_unqueue_multiple - Remove various futexes from their hash bucket
>    * @v:	   The list of futexes to unqueue
> @@ -665,8 +732,15 @@ int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
>   	if (ret)
>   		return ret;
>   
> -	/* futex_queue and wait for wakeup, timeout, or a signal. */
> -	futex_wait_queue(hb, &q, to);
> +	if (flags & FLAGS_SPIN) {
> +		ret = futex_spin(hb, &q, to, val);
The empty line below could be removed.

Thanks,

Mathieu

> +
> +		if (ret)
> +			return ret;
> +	} else {
> +		/* futex_queue and wait for wakeup, timeout, or a signal. */
> +		futex_wait_queue(hb, &q, to);
> +	}
>   
>   	/* If we were woken (and unqueued), we succeeded, whatever. */
>   	if (!futex_unqueue(&q))

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ