linux-kernel - Re: [RFC PATCH] locking/percpu-rwsem: do not do lock handoff in percpu_up

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xm267cjxf7f1.fsf@bsegall-linux.svl.corp.google.com>
Date: Thu, 25 Jan 2024 13:08:02 -0800
From: Benjamin Segall <bsegall@...gle.com>
To: Hillf Danton <hdanton@...a.com>
Cc: Peter Zijlstra <peterz@...radead.org>,  Ingo Molnar <mingo@...hat.com>,
  Will Deacon <will@...nel.org>,  Waiman Long <longman@...hat.com>,  Boqun
 Feng <boqun.feng@...il.com>,  linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] locking/percpu-rwsem: do not do lock handoff in
 percpu_up_write

Hillf Danton <hdanton@...a.com> writes:

> On Wed, 24 Jan 2024 14:10:43 -0800 Benjamin Segall <bsegall@...gle.com>
>> Hillf Danton <hdanton@...a.com> writes:
>> > On Mon, 22 Jan 2024 14:59:14 -0800 Benjamin Segall <bsegall@...gle.com>
>> >> So the actual problem we saw was that one job had severe slowdowns
>> >> during startup with certain other jobs on the machine, and the slowdowns
>> >> turned out to be some cgroup moves it did during startup. The antagonist
>> >> jobs were spawning huge numbers of threads and some other internal bugs
>> >> were exacerbating their contention. The lock handoff meant that a batch
>> >> of antagonist threads would receive the read lock of
>> >> cgroup_threadgroup_rwsem and at least some of those threads would take a
>> >> long time to be scheduled.
>> >
>> > If you want to avoid starved lock waiter, take a look at RWSEM_FLAG_HANDOFF
>> > in rwsem_down_read_slowpath().
>> 
>> rwsem's HANDOFF flag is the exact opposite of what this patch is doing.
>
> You and I are not on the same page.
>
>> Percpu-rwsem's current code has perfect handoff for read->write, and a very
>> short window for write->read (or write->write) to be beaten by a new writer.
>
> Given no chance left for spin on owner who is legal to take a ten-minute nap,
> the right thing known to do on behalf of starved waiters is to add the HANDOFF
> mechanism without any heuristic like you proposed for instance, in order to
> force lock acquirers to go the slow path.
>
> Only for thoughts.

This is not the type of slowdown that is the problem my patch is trying
to address. (And due to the way percpu-rwsem works sem->ww is nearly
entirely redundant with sem->block - the first waiting writer is instead
waiting on rcuwait and holds sem->block while doing so)

The problem that my patch addresses is:

Writer is done: percpu_up_write
  atomic_set_release(&sem->block, 0);  // #1
  wake a batch of readers:
    percpu_rwsem_wake_function -> __percpu_rwsem_trylock(reader) // #2
  wake a single writer
    percpu_rwsem_wake_function -> __percpu_rwsem_trylock(writer) // #3
new writer wakes up (holding sem->block from #3)
  sees the readers holding the lock from #2, now sleeps on rcuwait
time passes // #4
readers finally get to run, run quickly and release the lock
now the writer gets to run

Currently the only source of unfairness/optimistic locking is the window
between #1 and #2, which occur in quick succession, on the same thread,
and with no SPIN_ON_OWNER to make this window more likely than it
otherwise would be.

My patch makes the entire #4 available to writers (or new readers), so
that the woken writer will instead get to run immediately. This is
obviously much less fair, but provides much better throughput (ideally
it might have some sort of delay, so that in more normal circumstances
readers don't have to win the wakeup race by luck and being woken
slightly sooner, but I don't have that).

This is also only useful because of the assumption that readers will
almost always not actually block (among other required assumptions) - if
they regularly sleep while holding the lock, then saving writers from
that first wakeup latency of readers isn't particularly helpful anymore,
because they'll still be delayed by the other wakeup latencies.