[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YjtKRAgFmBfgU0al@google.com>
Date: Wed, 23 Mar 2022 09:26:44 -0700
From: Jaegeuk Kim <jaegeuk@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Tim Murray <timmurray@...gle.com>,
Waiman Long <longman@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux F2FS Dev Mailing List
<linux-f2fs-devel@...ts.sourceforge.net>
Subject: Re: [GIT PULL] f2fs for 5.18
On 03/22, Linus Torvalds wrote:
> On Tue, Mar 22, 2022 at 5:34 PM Tim Murray <timmurray@...gle.com> wrote:
> >
> > AFAICT, what's happening is that rwsem_down_read_slowpath
> > modifies sem->count to indicate that there's a pending reader while
> > f2fs_ckpt holds the write lock, and when f2fs_ckpt releases the write
> > lock, it wakes pending readers and hands the lock over to readers.
> > This means that any subsequent attempt to grab the write lock from
> > f2fs_ckpt will stall until the newly-awakened reader releases the read
> > lock, which depends on the readers' arbitrarily long scheduling
> > delays.
>
> Ugh.
>
> So I'm looking at some of this, and you have things like this:
>
> f2fs_down_read(&F2FS_I(inode)->i_sem);
> cp_reason = need_do_checkpoint(inode);
> f2fs_up_read(&F2FS_I(inode)->i_sem);
>
> which really doesn't seem to want a sleeping lock at all.
>
> In fact, it's not clear that it has any business serializing with IO
> at all. It seems to just check very basic inode state. Very strange.
> It's the kind of thing that the VFS layer tends to use te i_lock
> *spinlock* for.
Um.. let me check this i_sem, introduced by
d928bfbfe77a ("f2fs: introduce fi->i_sem to protect fi's info").
OTOH, I was suspecting the major contetion would be
f2fs_lock_op -> f2fs_down_read(&sbi->cp_rwsem);
, which was used for most of filesystem operations.
And, when we need to do checkpoint, we'd like to block internal operations by
f2fs_lock_all -> f2fs_down_write(&sbi->cp_rwsem);
So, what I expected was giving the highest priority to the checkpoint thread
by grabbing down_write to block all the other readers.
>
> And perhaps equally oddly, then when you do f2fs_issue_checkpoint(),
> _that_ code uses fancy lockless lists.
>
> I'm probably mis-reading it.
>
> Linus
Powered by blists - more mailing lists