lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250502090529.GU4198@noisy.programming.kicks-ass.net>
Date: Fri, 2 May 2025 11:05:29 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Prakash Sangappa <prakash.sangappa@...cle.com>
Cc: linux-kernel@...r.kernel.org, rostedt@...dmis.org,
	mathieu.desnoyers@...icios.com, tglx@...utronix.de,
	bigeasy@...utronix.de, kprateek.nayak@....com
Subject: Re: [PATCH V3 1/4] Sched: Scheduler time slice extension

On Fri, May 02, 2025 at 01:59:52AM +0000, Prakash Sangappa wrote:
> Add support for a thread to request extending its execution time slice on
> the cpu. The extra cpu time granted would help in allowing the thread to
> complete executing the critical section and drop any locks without getting
> preempted. The thread would request this cpu time extension, by setting a
> bit in the restartable sequences(rseq) structure registered with the kernel.
> 
> Kernel will grant a 50us extension on the cpu, when it sees the bit set.
> With the help of a timer, kernel force preempts the thread if it is still
> running on the cpu when the 50us timer expires. The thread should yield
> the cpu by making a system call after completing the critical section.
> 
> Suggested-by: Peter Ziljstra <peterz@...radead.org>
> Signed-off-by: Prakash Sangappa <prakash.sangappa@...cle.com>
> ---

> diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
> index c233aae5eac9..900cb75f6a88 100644
> --- a/include/uapi/linux/rseq.h
> +++ b/include/uapi/linux/rseq.h
> @@ -26,6 +26,7 @@ enum rseq_cs_flags_bit {
>  	RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT	= 0,
>  	RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT	= 1,
>  	RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT	= 2,
> +	RSEQ_CS_FLAG_DELAY_RESCHED_BIT		= 3,
>  };
>  
>  enum rseq_cs_flags {
> @@ -35,6 +36,8 @@ enum rseq_cs_flags {
>  		(1U << RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT),
>  	RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE	=
>  		(1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT),
> +	RSEQ_CS_FLAG_DELAY_RESCHED		=
> +		(1U << RSEQ_CS_FLAG_DELAY_RESCHED_BIT),
>  };
>  
>  /*
> @@ -128,6 +131,10 @@ struct rseq {
>  	 * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
>  	 *     Inhibit instruction sequence block restart on migration for
>  	 *     this thread.
> +	 * - RSEQ_CS_DELAY_RESCHED
> +	 *     Request by user task to try delaying preemption. With
> +	 *     use of a timer, extra cpu time upto 50us is granted for this
> +	 *     thread before being rescheduled.
>  	 */
>  	__u32 flags;

So while it works for testing, this really is a rather crap interface
for real because userspace cannot up front tell if its going to work or
not.

> +void rseq_delay_resched_fini(void)
> +{
> +#ifdef CONFIG_SCHED_HRTICK
> +	extern void hrtick_local_start(u64 delay);
> +	struct task_struct *t = current;
> +	/*
> +	 * IRQs off, guaranteed to return to userspace, start timer on this CPU
> +	 * to limit the resched-overdraft.
> +	 *
> +	 * If your critical section is longer than 50 us you get to keep the
> +	 * pieces.
> +	 */
> +	if (t->sched_time_delay)
> +		hrtick_local_start(50 * NSEC_PER_USEC);
> +#endif
> +}

Should not the interface at least reflect this SCHED_HRTICK status? One,
slightly hacky way of doing this might to be invert the bit. Have the
system write a 1 when the feature is present, and have userspace flip it
to 0 to activate.

A better way might be to add a second bit.

Also, didn't we all agree 50us was overly optimistic and this number
should be lower?

> diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
> index cd38f4e9899d..1b2b64fe0fb1 100644
> --- a/kernel/sched/syscalls.c
> +++ b/kernel/sched/syscalls.c
> @@ -1378,6 +1378,11 @@ static void do_sched_yield(void)
>   */
>  SYSCALL_DEFINE0(sched_yield)
>  {
> +	if (IS_ENABLED(CONFIG_RSEQ) && current->sched_time_delay) {
> +		schedule();
> +		return 0;
> +	}
> +
>  	do_sched_yield();
>  	return 0;
>  }

Multiple people, very much including Linus, have already said this
'cute' hack isn't going to fly. Why is it still here?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ