[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240123150541.1508-1-hdanton@sina.com>
Date: Tue, 23 Jan 2024 23:05:41 +0800
From: Hillf Danton <hdanton@...a.com>
To: Benjamin Segall <bsegall@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Will Deacon <will@...nel.org>,
Waiman Long <longman@...hat.com>,
Boqun Feng <boqun.feng@...il.com>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] locking/percpu-rwsem: do not do lock handoff in percpu_up_write
On Mon, 22 Jan 2024 14:59:14 -0800 Benjamin Segall <bsegall@...gle.com>
> The waitq wakeup in percpu_up_write necessarily runs the wake function
> immediately in the current thread. With it calling
> __percpu_rwsem_trylock on behalf of the thread being woken, the lock is
> extremely fair and FIFO, with the window for unfairness merely being the
> time between the release of sem->block and the completion of a relevant
> trylock.
>
> However, the woken threads that now hold the lock may not be chosen to
> run for a long time, and it would be useful to have more of this window
> available for a currently running thread to unfairly take the lock
> immediately and use it.
It makes no sense for lock acquirer to probe owner's activity except for
spining on owner. Nor for owner to guess if any acquirer comes soon.
> This can result in priority-inversion issues
> with high contention or things like CFS_BANDWIDTH quotas.
Given mutex could not avoid PI (priority-inversion) and deadlock, why is
percpu-rwsem special wrt PI?
>
> The even older version of percpu_rwsem that used an rwsem at its core
> provided some related gains in a different fashion through
> RWSEM_SPIN_ON_OWNER; while it had a similarly small window, waiting
> writers could spin, making it far more likely that a thread would hit
> this window.
>
> Signed-off-by: Ben Segall <bsegall@...gle.com>
>
> ---
>
> So the actual problem we saw was that one job had severe slowdowns
> during startup with certain other jobs on the machine, and the slowdowns
> turned out to be some cgroup moves it did during startup. The antagonist
> jobs were spawning huge numbers of threads and some other internal bugs
> were exacerbating their contention. The lock handoff meant that a batch
> of antagonist threads would receive the read lock of
> cgroup_threadgroup_rwsem and at least some of those threads would take a
> long time to be scheduled.
If you want to avoid starved lock waiter, take a look at RWSEM_FLAG_HANDOFF
in rwsem_down_read_slowpath().
Powered by blists - more mailing lists