[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCslObKt_kwVTn58@bfoster>
Date: Mon, 19 May 2025 08:34:01 -0400
From: Brian Foster <bfoster@...hat.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org
Subject: Re: [PATCH] ext4: only dirty folios when data journaling regular
files
On Mon, May 19, 2025 at 12:24:32PM +0200, Jan Kara wrote:
> On Fri 16-05-25 13:38:00, Brian Foster wrote:
> > fstest generic/388 occasionally reproduces a crash that looks as
> > follows:
> >
> > BUG: kernel NULL pointer dereference, address: 0000000000000000
> > ...
> > Call Trace:
> > <TASK>
> > ext4_block_zero_page_range+0x30c/0x380 [ext4]
> > ext4_truncate+0x436/0x440 [ext4]
> > ext4_process_orphan+0x5d/0x110 [ext4]
> > ext4_orphan_cleanup+0x124/0x4f0 [ext4]
> > ext4_fill_super+0x262d/0x3110 [ext4]
> > get_tree_bdev_flags+0x132/0x1d0
> > vfs_get_tree+0x26/0xd0
> > vfs_cmd_create+0x59/0xe0
> > __do_sys_fsconfig+0x4ed/0x6b0
> > do_syscall_64+0x82/0x170
> > ...
> >
> > This occurs when processing a symlink inode from the orphan list. The
> > partial block zeroing code in the truncate path calls
> > ext4_dirty_journalled_data() -> folio_mark_dirty(). The latter calls
> > mapping->a_ops->dirty_folio(), but symlink inodes are not assigned an
> > a_ops vector in ext4, hence the crash.
> >
> > To avoid this problem, update the ext4_dirty_journalled_data() helper to
> > only mark the folio dirty on regular files (for which a_ops is
> > assigned). This also matches the journaling logic in the ext4_symlink()
> > creation path, where ext4_handle_dirty_metadata() is called directly.
> >
> > Fixes: d84c9ebdac1e ("ext4: Mark pages with journalled data dirty")
> > Signed-off-by: Brian Foster <bfoster@...hat.com>
>
> Yeah, I forgot about this subtlety when writing d84c9ebdac1e. Good catch
> and thanks for fixing this up! The fix looks good. Feel free to add:
>
> Reviewed-by: Jan Kara <jack@...e.cz>
>
> > ---
> >
> > Hi Jan,
> >
> > I'm not intimately familiar with the jbd machinery here so this may well
> > be wrong, but it survives my testing so far. I initially hacked this to
> > mark the buffer dirty instead of the folio, but discovered jbd2 doesn't
> > seem to like that. I suspect that is because jbd2 wants to dirty/submit
> > the buffer itself after it's logged..?
> >
> > Anyways, after that, this struck me as most consistent with behavior
> > prior to d84c9ebdac1e and/or with the creation path, so I'm floating
> > this as a first pass. Is my understanding of d84c9ebdac1e correct in
> > that it is mainly an optimization to allow writeback to force the
> > journaling mechanism vs. otherwise waiting for the other way around
> > (i.e. a journal commit to mark folios dirty)? Thoughts appreciated..
>
> Well, the motivation for d84c9ebdac1e was not so much an optimization but
> rather to provide better visibility to the generic code what needs writing
> out. Otherwise we had to special-case data journalling in a lot of places
> that tried to do "clean the inode & purge the page cache" because simple
> filemap_write_and_wait() was not enough to get the dirty pages in the inode
> to disk.
>
Ah, I see. Thanks for the insight (and review).
Brian
> Honza
>
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 94c7d2d828a6..d3c138003ad3 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -1009,7 +1009,12 @@ int ext4_walk_page_buffers(handle_t *handle, struct inode *inode,
> > */
> > static int ext4_dirty_journalled_data(handle_t *handle, struct buffer_head *bh)
> > {
> > - folio_mark_dirty(bh->b_folio);
> > + struct folio *folio = bh->b_folio;
> > + struct inode *inode = folio->mapping->host;
> > +
> > + /* only regular files have a_ops */
> > + if (S_ISREG(inode->i_mode))
> > + folio_mark_dirty(folio);
> > return ext4_handle_dirty_metadata(handle, NULL, bh);
> > }
> >
> > --
> > 2.49.0
> >
> --
> Jan Kara <jack@...e.com>
> SUSE Labs, CR
>
Powered by blists - more mailing lists