linux-kernel - Re: [PATCH] f2fs: move f2fs to use reader-unfair rwsems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YhXlXY28XiG7lVH1@infradead.org>
Date:   Tue, 22 Feb 2022 23:42:21 -0800
From:   Christoph Hellwig <hch@...radead.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Tim Murray <timmurray@...gle.com>,
        Waiman Long <longman@...hat.com>,
        Christoph Hellwig <hch@...radead.org>,
        Jaegeuk Kim <jaegeuk@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-f2fs-devel@...ts.sourceforge.net,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>
Subject: Re: [PATCH] f2fs: move f2fs to use reader-unfair rwsems

It looks like this patch landed in linux-next despite all the perfectly
reasonable objections.  Jaegeuk, please drop it again.

On Wed, Jan 12, 2022 at 03:06:12PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 10, 2022 at 11:41:23AM -0800, Tim Murray wrote:
> 
> > 1. f2fs-ckpt thread is running f2fs_write_checkpoint(), holding the
> > cp_rwsem write lock while doing so via f2fs_lock_all() in
> > block_operations().
> > 2. Random very-low-priority thread A makes some other f2fs call that
> > tries to get the cp_rwsem read lock by atomically adding on the rwsem,
> > fails and deschedules in uninterruptible sleep. cp_rwsem now has a
> > non-zero reader count but is write-locked.
> > 3. f2fs-ckpt thread releases the cp_rwsem write lock. cp_rwsem now has
> > a non-zero reader count and is not write-locked, so is reader-locked.
> > 4. Other threads call fsync(), which requests checkpoints from
> > f2fs-ckpt, and block on a completion event that f2fs-ckpt dispatches.
> > cp_rwsem still has a non-zero reader count because the low-prio thread
> > A from (2) has not been scheduled again yet.
> > 5. f2fs-ckpt wakes up to perform checkpoints, but it stalls on the
> > write lock via cmpxchg in block_operations() until the low-prio thread
> > A has run and released the cp_rwsem read lock. Because f2fs-ckpt can't
> > run, all fsync() callers are also effectively blocked by the
> > low-priority thread holding the read lock.
> > 
> > I think this is the rough shape of the problem (vs readers holding the
> > lock for too long or something like that) because the low-priority
> > thread is never run between when it is initially made runnable by
> > f2fs-ckpt and when it runs tens/hundreds of milliseconds later then
> > immediately unblocks f2fs-ckpt.
> 
> *urgh*... so you're making the worst case less likely but fundamentally
> you don't change anything.
> 
> If one of those low prio threads manages to block while holding
> cp_rwsem your checkpoint thread will still block for a very long time.
> 
> So while you improve the average case, the worst case doesn't improve
> much I think.
> 
> Also, given that this is a system wide rwsem, would percpu-rwsem not be
> 'better' ? Arguably with the same hack cgroups uses for it (see
> cgroup_init()) to lower the cost of percpu_down_write().
> 
> Now, I'm not a filesystem developer and I'm not much familiar with the
> problem space, but this locking reads like a fairly big problem. I'm not
> sure optimizing the lock is the answer.
> 
> 
---end quoted text---