lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 25 Jan 2024 19:04:56 +0800
From: Hillf Danton <hdanton@...a.com>
To: Benjamin Segall <bsegall@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	Will Deacon <will@...nel.org>,
	Waiman Long <longman@...hat.com>,
	Boqun Feng <boqun.feng@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] locking/percpu-rwsem: do not do lock handoff in percpu_up_write

On Wed, 24 Jan 2024 14:10:43 -0800 Benjamin Segall <bsegall@...gle.com>
> Hillf Danton <hdanton@...a.com> writes:
> > On Mon, 22 Jan 2024 14:59:14 -0800 Benjamin Segall <bsegall@...gle.com>
> >> So the actual problem we saw was that one job had severe slowdowns
> >> during startup with certain other jobs on the machine, and the slowdowns
> >> turned out to be some cgroup moves it did during startup. The antagonist
> >> jobs were spawning huge numbers of threads and some other internal bugs
> >> were exacerbating their contention. The lock handoff meant that a batch
> >> of antagonist threads would receive the read lock of
> >> cgroup_threadgroup_rwsem and at least some of those threads would take a
> >> long time to be scheduled.
> >
> > If you want to avoid starved lock waiter, take a look at RWSEM_FLAG_HANDOFF
> > in rwsem_down_read_slowpath().
> 
> rwsem's HANDOFF flag is the exact opposite of what this patch is doing.

You and I are not on the same page.

> Percpu-rwsem's current code has perfect handoff for read->write, and a very
> short window for write->read (or write->write) to be beaten by a new writer.

Given no chance left for spin on owner who is legal to take a ten-minute nap,
the right thing known to do on behalf of starved waiters is to add the HANDOFF
mechanism without any heuristic like you proposed for instance, in order to
force lock acquirers to go the slow path.

Only for thoughts.

--- x/kernel/locking/percpu-rwsem.c
+++ y/kernel/locking/percpu-rwsem.c
@@ -22,6 +22,7 @@ int __percpu_init_rwsem(struct percpu_rw
 	rcuwait_init(&sem->writer);
 	init_waitqueue_head(&sem->waiters);
 	atomic_set(&sem->block, 0);
+	atomic_set(&sem->ww, 0);	/* write waiters */
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	debug_check_no_locks_freed((void *)sem, sizeof(*sem));
 	lockdep_init_map(&sem->dep_map, name, key, 0);
@@ -135,6 +136,9 @@ static int percpu_rwsem_wake_function(st
 	wake_up_process(p);
 	put_task_struct(p);
 
+	if (!reader)
+		atomic_dec(&sem->ww);
+
 	return !reader; /* wake (readers until) 1 writer */
 }
 
@@ -148,8 +152,10 @@ static void percpu_rwsem_wait(struct per
 	 * Serialize against the wakeup in percpu_up_write(), if we fail
 	 * the trylock, the wakeup must see us on the list.
 	 */
-	wait = !__percpu_rwsem_trylock(sem, reader);
+	wait = atomic_read(&sem->ww) || !__percpu_rwsem_trylock(sem, reader);
 	if (wait) {
+		if (!reader)
+			atomic_inc(&sem->ww);
 		wq_entry.flags |= WQ_FLAG_EXCLUSIVE | reader * WQ_FLAG_CUSTOM;
 		__add_wait_queue_entry_tail(&sem->waiters, &wq_entry);
 	}
@@ -166,7 +172,7 @@ static void percpu_rwsem_wait(struct per
 
 bool __sched __percpu_down_read(struct percpu_rw_semaphore *sem, bool try)
 {
-	if (__percpu_down_read_trylock(sem))
+	if (!atomic_read(&sem->ww) && __percpu_down_read_trylock(sem))
 		return true;
 
 	if (try)
@@ -234,7 +240,7 @@ void __sched percpu_down_write(struct pe
 	 * Try set sem->block; this provides writer-writer exclusion.
 	 * Having sem->block set makes new readers block.
 	 */
-	if (!__percpu_down_write_trylock(sem))
+	if (atomic_read(&sem->ww) || !__percpu_down_read_trylock(sem))
 		percpu_rwsem_wait(sem, /* .reader = */ false);
 
 	/* smp_mb() implied by __percpu_down_write_trylock() on success -- D matches A */

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ