[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200923024859.GM32101@casper.infradead.org>
Date: Wed, 23 Sep 2020 03:48:59 +0100
From: Matthew Wilcox <willy@...radead.org>
To: Qian Cai <cai@...hat.com>
Cc: linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
"Darrick J . Wong" <darrick.wong@...cle.com>,
Christoph Hellwig <hch@...radead.org>,
linux-nvdimm@...ts.01.org, linux-kernel@...r.kernel.org,
Dave Kleikamp <shaggy@...nel.org>,
jfs-discussion@...ts.sourceforge.net,
Dave Chinner <dchinner@...hat.com>,
Stephen Rothwell <sfr@...b.auug.org.au>,
linux-next@...r.kernel.org
Subject: Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
On Tue, Sep 22, 2020 at 09:06:03PM -0400, Qian Cai wrote:
> On Tue, 2020-09-22 at 18:05 +0100, Matthew Wilcox wrote:
> > On Tue, Sep 22, 2020 at 12:23:45PM -0400, Qian Cai wrote:
> > > On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote:
> > > > Size the uptodate array dynamically to support larger pages in the
> > > > page cache. With a 64kB page, we're only saving 8 bytes per page today,
> > > > but with a 2MB maximum page size, we'd have to allocate more than 4kB
> > > > per page. Add a few debugging assertions.
> > > >
> > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> > > > Reviewed-by: Dave Chinner <dchinner@...hat.com>
> > >
> > > Some syscall fuzzing will trigger this on powerpc:
> > >
> > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> > >
> > > [ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at fs/iomap/buffered-
> > > io.c:78 iomap_page_release+0x250/0x270
> >
> > Well, I'm glad it triggered. That warning is:
> > WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
> > PageUptodate(page));
> > so there was definitely a problem of some kind.
> >
> > truncate_cleanup_page() calls
> > do_invalidatepage() calls
> > iomap_invalidatepage() calls
> > iomap_page_release()
> >
> > Is this the first warning? I'm wondering if maybe there was an I/O error
> > earlier which caused PageUptodate to get cleared again. If it's easy to
> > reproduce, perhaps you could try something like this?
> >
> > +void dump_iomap_page(struct page *page, const char *reason)
> > +{
> > + struct iomap_page *iop = to_iomap_page(page);
> > + unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
> > +
> > + dump_page(page, reason);
> > + if (iop)
> > + printk("iop:reads %d writes %d uptodate %*pb\n",
> > + atomic_read(&iop->read_bytes_pending),
> > + atomic_read(&iop->write_bytes_pending),
> > + nr_blocks, iop->uptodate);
> > + else
> > + printk("iop:none\n");
> > +}
> >
> > and then do something like:
> >
> > if (bitmap_full(iop->uptodate, nr_blocks) != PageUptodate(page))
> > dump_iomap_page(page, NULL);
>
> This:
>
> [ 1683.158254][T164965] page:000000004a6c16cd refcount:2 mapcount:0 mapping:00000000ea017dc5 index:0x2 pfn:0xc365c
> [ 1683.158311][T164965] aops:xfs_address_space_operations ino:417b7e7 dentry name:"trinity-testfile2"
> [ 1683.158354][T164965] flags: 0x7fff8000000015(locked|uptodate|lru)
> [ 1683.158392][T164965] raw: 007fff8000000015 c00c0000019c4b08 c00c0000019a53c8 c000201c8362c1e8
> [ 1683.158430][T164965] raw: 0000000000000002 0000000000000000 00000002ffffffff c000201c54db4000
> [ 1683.158470][T164965] page->mem_cgroup:c000201c54db4000
> [ 1683.158506][T164965] iop:none
Oh, I'm a fool. This is after the call to detach_page_private() so
page->private is NULL and we don't get the iop dumped.
Nevertheless, this is interesting. Somehow, the page is marked Uptodate,
but the bitmap is deemed not full. There are three places where we set
an iomap page Uptodate:
1. if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
SetPageUptodate(page);
2. if (page_has_private(page))
iomap_iop_set_range_uptodate(page, off, len);
else
SetPageUptodate(page);
3. BUG_ON(page->index);
...
SetPageUptodate(page);
It can't be #2 because the page has an iop. It can't be #3 because the
page->index is not 0. So at some point in the past, the bitmap was full.
I don't think it's possible for inode->i_blksize to change, and you
aren't running with THPs, so it's definitely not possible for thp_size()
to change. So i_blocks_per_page() isn't going to change.
We seem to have allocated enough memory for ->iop because that's also
based on i_blocks_per_page().
I'm out of ideas. Maybe I'll wake up with a better idea in the morning.
I've been trying to reproduce this on x86 with a 1kB block size
filesystem, and haven't been able to yet. Maybe I'll try to setup a
powerpc cross-compilation environment tomorrow.
Powered by blists - more mailing lists