lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CABCx4RC2e09tUYC+B025MC0oHMrifJdax=n=8Q8mLmuF=bW4MA@mail.gmail.com>
Date: Tue, 19 Aug 2025 16:08:32 +0200
From: Kuba Piecuch <jpiecuch@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, vincent.guittot@...aro.org, dietmar.eggemann@....com, 
	joshdon@...gle.com, linux-kernel@...r.kernel.org, 
	david.laight.linux@...il.com
Subject: Re: [RFC PATCH 0/3] sched: add ability to throttle sched_yield()
 calls to reduce contention

On Thu, Aug 14, 2025 at 4:53 PM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Mon, Aug 11, 2025 at 03:35:35PM +0200, Kuba Piecuch wrote:
> > On Mon, Aug 11, 2025 at 10:36 AM Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > On Fri, Aug 08, 2025 at 08:02:47PM +0000, Kuba Piecuch wrote:
> > > > Problem statement
> > > > =================
> > > >
> > > > Calls to sched_yield() can touch data shared with other threads.
> > > > Because of this, userspace threads could generate high levels of contention
> > > > by calling sched_yield() in a tight loop from multiple cores.
> > > >
> > > > For example, if cputimer is enabled for a process (e.g. through
> > > > setitimer(ITIMER_PROF, ...)), all threads of that process
> > > > will do an atomic add on the per-process field
> > > > p->signal->cputimer->cputime_atomic.sum_exec_runtime inside
> > > > account_group_exec_runtime(), which is called inside update_curr().
> > > >
> > > > Currently, calling sched_yield() will always call update_curr() at least
> > > > once in schedule(), and potentially one more time in yield_task_fair().
> > > > Thus, userspace threads can generate quite a lot of contention for the
> > > > cacheline containing cputime_atomic.sum_exec_runtime if multiple threads of
> > > > a process call sched_yield() in a tight loop.
> > > >
> > > > At Google, we suspect that this contention led to a full machine lockup in
> > > > at least one instance, with ~50% of CPU cycles spent in the atomic add
> > > > inside account_group_exec_runtime() according to
> > > > `perf record -a -e cycles`.
> > >
> > > I've gotta ask, WTH is your userspace calling yield() so much?
> >
> > The code calling sched_yield() was in the wait loop for a spinlock. It
> > would repeatedly yield until the compare-and-swap instruction succeeded
> > in acquiring the lock. This code runs in the SIGPROF handler.
>
> Well, then don't do that... userspace spinlocks are terrible, and
> bashing yield like that isn't helpful either.
>
> Throttling yield seems like entirely the wrong thing to do. Yes, yield()
> is poorly defined (strictly speaking UB for anything not FIFO/RR) but
> making it actively worse doesn't seem helpful.
>
> The whole itimer thing is not scalable -- blaming that on yield seems
> hardly fair.
>
> Why not use timer_create(), with CLOCK_THREAD_CPUTIME_ID and
> SIGEV_SIGNAL instead?

I agree that there are userspace changes we can make to reduce contention
and prevent future lockups. What that doesn't address is the potential for
userspace to trigger kernel lockups, maliciously or unintentionally, via
spamming yield(). This patch series introduces a way to reduce contention
and risk of userspace-induced lockups regardless of userspace behavior
-- that's the value proposition.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ