linux-kernel - Re: [RFC PATCH] locking/percpu-rwsem: do not do lock handoff in percpu_up

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20240123150541.1508-1-hdanton@sina.com>
Date: Tue, 23 Jan 2024 23:05:41 +0800
From: Hillf Danton <hdanton@...a.com>
To: Benjamin Segall <bsegall@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	Will Deacon <will@...nel.org>,
	Waiman Long <longman@...hat.com>,
	Boqun Feng <boqun.feng@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] locking/percpu-rwsem: do not do lock handoff in percpu_up_write

On Mon, 22 Jan 2024 14:59:14 -0800 Benjamin Segall <bsegall@...gle.com>
> The waitq wakeup in percpu_up_write necessarily runs the wake function
> immediately in the current thread. With it calling
> __percpu_rwsem_trylock on behalf of the thread being woken, the lock is
> extremely fair and FIFO, with the window for unfairness merely being the
> time between the release of sem->block and the completion of a relevant
> trylock.
> 
> However, the woken threads that now hold the lock may not be chosen to
> run for a long time, and it would be useful to have more of this window
> available for a currently running thread to unfairly take the lock
> immediately and use it.

It makes no sense for lock acquirer to probe owner's activity except for
spining on owner. Nor for owner to guess if any acquirer comes soon.

> This can result in priority-inversion issues
> with high contention or things like CFS_BANDWIDTH quotas.

Given mutex could not avoid PI (priority-inversion) and deadlock, why is
percpu-rwsem special wrt PI?
> 
> The even older version of percpu_rwsem that used an rwsem at its core
> provided some related gains in a different fashion through
> RWSEM_SPIN_ON_OWNER; while it had a similarly small window, waiting
> writers could spin, making it far more likely that a thread would hit
> this window.
> 
> Signed-off-by: Ben Segall <bsegall@...gle.com>
> 
> ---
> 
> So the actual problem we saw was that one job had severe slowdowns
> during startup with certain other jobs on the machine, and the slowdowns
> turned out to be some cgroup moves it did during startup. The antagonist
> jobs were spawning huge numbers of threads and some other internal bugs
> were exacerbating their contention. The lock handoff meant that a batch
> of antagonist threads would receive the read lock of
> cgroup_threadgroup_rwsem and at least some of those threads would take a
> long time to be scheduled.

If you want to avoid starved lock waiter, take a look at RWSEM_FLAG_HANDOFF
in rwsem_down_read_slowpath().