[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220608142138.7nejka3yobgsdmd7@quack3.lan>
Date: Wed, 8 Jun 2022 16:21:38 +0200
From: Jan Kara <jack@...e.cz>
To: Matthew Wilcox <willy@...radead.org>
Cc: Jan Kara <jack@...e.cz>, Jan Kara <jack@...e.com>, tytso@....edu,
Andreas Dilger <adilger.kernel@...ger.ca>,
linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 3/3] ext4: Use generic_quota_read()
On Wed 08-06-22 02:42:21, Matthew Wilcox wrote:
> On Mon, Jun 06, 2022 at 10:38:14AM +0200, Jan Kara wrote:
> > On Sun 05-06-22 15:38:15, Matthew Wilcox (Oracle) wrote:
> > > The comment about the page cache is rather stale; the buffer cache will
> > > read into the page cache if the buffer isn't present, and the page cache
> > > will not take any locks if the page is present.
> > >
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> >
> > This will not work for couple of reasons, see below. BTW, I don't think the
> > comment about page cache was stale (but lacking details I admit ;). As far
> > as I remember (and it was really many years ago - definitely pre-git era)
> > the problem was (mainly on the write side) that before current state of the
> > code we were using calls like vfs_read() / vfs_write() to get quota
> > information and that was indeed prone to deadlocks.
>
> Ah yes, vfs_write() might indeed be prone to deadlocks. Particularly
> if we're doing it under the dq_mutex and any memory allocation might
> have recursed into reclaim ;-)
>
> I actually found the commit in linux-fullhistory. Changelog for
> context:
>
> commit b72debd66a6ed
> Author: Jan Kara <jack@...e.cz>
> Date: Mon Jan 3 04:12:24 2005 -0800
>
> [PATCH] Fix of quota deadlock on pagelock: quota core
\o/ to you history searching skills :)
> > > @@ -6924,20 +6882,21 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
> > > return -EIO;
> > > }
> > >
> > > - do {
> > > - bh = ext4_bread(handle, inode, blk,
> > > - EXT4_GET_BLOCKS_CREATE |
> > > - EXT4_GET_BLOCKS_METADATA_NOFAIL);
> > > - } while (PTR_ERR(bh) == -ENOSPC &&
> > > - ext4_should_retry_alloc(inode->i_sb, &retries));
> > > - if (IS_ERR(bh))
> > > - return PTR_ERR(bh);
> > > - if (!bh)
> > > + folio = read_mapping_folio(inode->i_mapping, off / PAGE_SIZE, NULL);
> > > + if (IS_ERR(folio))
> > > + return PTR_ERR(folio);
> > > + head = folio_buffers(folio);
> > > + if (!head)
> > > + head = alloc_page_buffers(&folio->page, sb->s_blocksize, false);
> > > + if (!head)
> > > goto out;
> > > + bh = head;
> > > + while ((bh_offset(bh) + sb->s_blocksize) <= (off % PAGE_SIZE))
> > > + bh = bh->b_this_page;
> >
> > We miss proper handling of blocks that are currently beyond i_size
> > (we are extending the quota file), plus we also miss any mapping of buffers
> > to appropriate disk blocks here...
> >
> > It could be all fixed by replicating what we do in ext4_write_begin() but
> > I'm not quite convinced using inode's page cache is really worth it...
>
> Ah, yes, write_begin. Of course that's what I should have used.
>
> I'm looking at this from the point of view of removing buffer_heads
> where possible. Of course, it's not possible for ext4 while the journal
> relies on buffer_heads, but if we can steer filesystems away from using
> sb_bread() (or equivalents), I think that's a good thing.
Well, ext4 uses sb_bread() (sb_getblk()) for all its metadata so quota
code, which is rather well localized, is the least of your worries I'm
afraid ;).
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists