linux-kernel - Re: [PATCH] f2fs: move f2fs to use reader-unfair rwsems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <86891228-9c91-09f1-0e2d-0a3392649d52@redhat.com>
Date:   Tue, 11 Jan 2022 10:04:33 -0500
From:   Waiman Long <longman@...hat.com>
To:     Tim Murray <timmurray@...gle.com>
Cc:     Christoph Hellwig <hch@...radead.org>,
        Jaegeuk Kim <jaegeuk@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-f2fs-devel@...ts.sourceforge.net,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>
Subject: Re: [PATCH] f2fs: move f2fs to use reader-unfair rwsems


On 1/11/22 01:53, Tim Murray wrote:
> On Mon, Jan 10, 2022 at 8:15 PM Waiman Long <longman@...hat.com> wrote:
>> That is not how rwsem works. A reader which fails to get the lock
>> because it is write-locked will remove its reader count before going to
>> sleep. So the reader count will be zero eventually. Of course, there is
>> a short period of time where the reader count will be non-zero until the
>> reader removes its own reader count. So if a new writer comes in at that
>> time, it will fail its initial trylock and probably go to optimistic
>> spinning mode. If the writer that owns the lock release it at the right
>> moment, the reader may acquire the read lock.
> Thanks for the correction, that makes sense. I haven't spent too much
> time on rwsem internals and I'm not confident about when flags are set
> and cleared in sem->count; is there a case where sem->count after
> up_write() could be nonzero?
>
> An example from one trace:
>
> 1. Low-priority userspace thread 4764 is blocked in f2fs_unlink,
> probably at f2fs_lock_op, which is a wrapper around
> down_read(cp_rwsem).
> 2. f2fs-ckpt runs at t=0ms and wakes thread 4764, making it runnable.
> 3. At t=1ms, f2fs-ckpt enters uninterruptible sleep and blocks at
> rwsem_down_write_slowpath per sched_blocked_reason.
> 4. At t=26ms, thread 4764 runs for the first time since being made
> runnable. Within 40us, thread 4764 unblocks f2fs-ckpt and makes it
> runnable.
>
> Since thread 4764 is awakened by f2fs-ckpt but never runs before it
> unblocks f2fs-ckpt in down_write_slowpath(), the only idea I had is
> that cp_rwsem->count is nonzero after f2fs-ckpt's up_write() in step 2
> (maybe because of rwsem_mark_wake()?).
>
>> I do have a question about the number of readers in such a case compared
>> with the number of writers. Are there a large number of low priority
>> hanging around? What is an average read lock hold time?
>>
>> Blocking for 9.7s for a write lock is quite excessive and we need to
>> figure out how this happen.,
> Just to be 100% clear, it's not a single 9.7s stall, it's many smaller
> stalls of 10-500+ms in f2fs-ckpt that add up to 9.7s over that range.
>
> f2fs is not my area of expertise, but my understanding is that
> cp_rwsem in f2fs has many (potentially unbounded) readers and a single
> writer. Arbitrary userspace work (fsync, creating/deleting/truncating
> files, atomic writes) may grab the read lock, but assuming the
> merge_checkpoint option is enabled, only f2fs-ckpt will ever grab the
> write lock during normal operation. However, in this particular
> example, it looks like there may have been 5-10 threads blocked on
> f2fs-ckpt that were awakened alongside thread 4764 in step 2.
>
> I'll defer to the f2fs experts on the average duration that the read
> lock is held.

Thanks for the explanation.

Another question that I have is whether the test result is based on the 
latest upstream kernel or earlier kernel version. We used to allow 
reader optimistic spinning which was then removed in later kernel. 
Reader optimistic spinning may further increase writer wait time.

Anyway, AFAICS, this patch keeps readers out of the rwsem wait queue and 
so only writers can go into it. We can make an unfair rwsem to give 
preference to writers in the wait queue and wake up readers only if 
there is no more writers in the wait queue or even in the optimistic 
spinning queue. That should achieve the same effect as this patch.

Cheers,
Longman