[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230109163123.mizksqivfmozaz4f@offworld>
Date: Mon, 9 Jan 2023 08:31:23 -0800
From: Davidlohr Bueso <dave@...olabs.net>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Mel Gorman <mgorman@...hsingularity.net>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Linux-RT <linux-rt-users@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
torvalds@...ux-foundation.org, frederic@...nel.org
Subject: Re: [RFC PATCH] locking/rwbase: Prevent indefinite writer starvation
On Mon, 09 Jan 2023, Peter Zijlstra wrote:
>On Fri, Jan 06, 2023 at 02:27:43PM +0000, Mel Gorman wrote:
>> rw_semaphore and rwlock are explicitly unfair to writers in the presense
>> of readers by design with a PREEMPT_RT configuration. Commit 943f0edb754f
>> ("locking/rt: Add base code for RT rw_semaphore and rwlock") notes;
>>
>> The implementation is writer unfair, as it is not feasible to do
>> priority inheritance on multiple readers, but experience has shown
>> that real-time workloads are not the typical workloads which are
>> sensitive to writer starvation.
>>
>> While atypical, it's also trivial to block writers with PREEMPT_RT
>> indefinitely without ever making forward progress. Since LTP-20220121,
>> the dio_truncate test case went from having 1 reader to having 16 readers
>> and the number of readers is sufficient to prevent the down_write ever
>> succeeding while readers exist. Ultimately the test is killed after 30
>> minutes as a failure.
>>
>> dio_truncate is not a realtime application but indefinite writer starvation
>> is undesirable. The test case has one writer appending and truncating files
>> A and B while multiple readers read file A. The readers and writer are
>> contending for one file's inode lock which never succeeds as the readers
>> keep reading until the writer is done which never happens.
>>
>> This patch records a timestamp when the first writer is blocked. Reader
>> bias is allowed until the first writer has been blocked for a minimum of
>> 4ms and a maximum of (4ms + 1 jiffie). The cutoff time is arbitrary on
>> the assumption that a hard realtime application missing a 4ms deadline
>> would not need PRREMPT_RT. It's expected that hard realtime applications
>> avoid such heavy reader/writer contention by design. On a test machine,
>> the test completed in 92 seconds.
>
>> static int __sched __rwbase_read_lock(struct rwbase_rt *rwb,
>> unsigned int state)
>> {
>> @@ -76,7 +79,8 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *rwb,
>> * Allow readers, as long as the writer has not completely
>> * acquired the semaphore for write.
>> */
>> - if (atomic_read(&rwb->readers) != WRITER_BIAS) {
>> + if (atomic_read(&rwb->readers) != WRITER_BIAS &&
>> + jiffies - rwb->waiter_blocked < RW_CONTENTION_THRESHOLD) {
>> atomic_inc(&rwb->readers);
>> raw_spin_unlock_irq(&rtm->wait_lock);
>> return 0;
>
>Blergh.
>
>So a number of comments:
>
> - this deserves a giant comment, not only an obscure extra condition.
>
> - this would be better if it were limited to only have effect
> when there are no RT/DL tasks involved.
Agreed.
(Sorry for hijacking this thread, also more Cc)
Hmm this reminds me of the epoll rwlock situation[1, 2] which does the lockless
ready event list updates from irq callback context and hits the writer unfair
scenario, which was designed really for tasklist_lock. Converting the read_lock
to RCU looks like a no-go because this is not a read-mostly pattern, far from
it actually. And in fact the read path is not at all a read path (ie: simply
traversing the list(s)). We also probably hit this unfair is good for throughput
condition mentioned by Linus as these are spinning locks and thus a short critical
region to really benefit from actual concurrent readers.
So while the numbers in a218cc491420 (epoll: use rwlock in order to reduce ep_poll
callback() contention) are very nice, based on the above and the fact that per
the changelog it does misasume the fairness I would vote for removing the lockless
stuff and return to simply using a spinlock (epoll is wacky enough already).
It is ultimately less burden on the kernel, and I suspect that people who really
care about epoll performance will mostly be looking at io_uring.
Thanks,
Davidlohr
[1] https://lore.kernel.org/all/20210825132754.GA895675@lothringen/
[2] https://lore.kernel.org/all/20220617091039.2257083-1-eric.dumazet@gmail.com/
>
>This made me re-read the phase-fair rwlock paper and again note that RW
>semaphore (eg blocking) variant was delayed to future work and AFAICT
>this future hasn't happened yet :/
>
>AFAICT it would still require boosting the readers (something tglx still
>has nightmares of) and limiting reader concurrency, another thing that
>hurts.
>
>
Powered by blists - more mailing lists