lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4d4dahpbk33ntrqaz26lahxpqx2zox2jwowz6thnlzdy4u2sd4@zhpgpwtga5sd>
Date: Thu, 4 Sep 2025 11:06:10 +0200
From: Jan Kara <jack@...e.cz>
To: Ritesh Harjani <ritesh.list@...il.com>
Cc: Jan Kara <jack@...e.cz>, Sun Yongjian <sunyongjian1@...wei.com>, 
	linux-ext4@...r.kernel.org, yangerkun@...wei.com, yi.zhang@...wei.com, libaokun1@...wei.com, 
	tytso@....edu
Subject: Re: [PATCH -next] ext4: add an update to i_disksize in
 ext4_block_page_mkwrite

On Thu 04-09-25 09:33:56, Ritesh Harjani wrote:
> Jan Kara <jack@...e.cz> writes:
> > On Mon 01-09-25 15:01:45, Sun Yongjian wrote:
> >> 在 2025/7/31 22:05, sunyongjian@...weicloud.com 写道:
> >> Gentle ping.
> >> > From: Yongjian Sun <sunyongjian1@...wei.com>
> >> > 
> >> > After running a stress test combined with fault injection,
> >> > we performed fsck -a followed by fsck -fn on the filesystem
> >> > image. During the second pass, fsck -fn reported:
> >> > 
> >> > Inode 131512, end of extent exceeds allowed value
> >> > 	(logical block 405, physical block 1180540, len 2)
> >> > 
> >> > This inode was not in the orphan list.
> >
> > Thanks for report! Interesting... Which kernel were you using?
> >
> >> > Analysis revealed the
> >> > following call chain that leads to the inconsistency:
> >> > 
> >> >                               ext4_da_write_end()
> >> >                                //does not update i_disksize
> >
> > Right, for any write beyond i_disksize to unallocated blocks we update
> > i_disksize only during page writeback.
> >
> >> >                               ext4_punch_hole()
> >> >                                //truncate folio, keep size
> >
> > So here offset + len passed to ext4_punch_hole() is important. Because
> > there's ext4_update_disksize_before_punch() call which updates i_disksize
> > to i_size if the punched hole reaches EOF. So did you punch hole in the
> > middle of the file?
> >
> >> > ext4_page_mkwrite()
> >> >   ext4_block_page_mkwrite()
> >> >    ext4_block_write_begin()
> >> >      ext4_get_block()
> >> >       //insert written extent without update i_disksize
> >
> > We should insert unwritten extent here, shouldn't we? We use
> > ext4_get_block_unwritten() when we are inside i_size. Ah, you mention below
> > you use nodioread_nolock. Nasty :)
> >
> >> > journal commit
> >> > echo 1 > /sys/block/xxx/device/delete
> >> > 
> >> > da-write path updates i_size but does not update i_disksize. Then
> >> > ext4_punch_hole truncates the da-folio yet still leaves i_disksize
> >> > unchanged. Then ext4_page_mkwrite sees ext4_nonda_switch return 1
> >> > and takes the nodioread_nolock path, the folio about to be written
> >> > has just been punched out, and it’s offset sits beyond the current
> >> > i_disksize. This may result in a written extent being inserted, but
> >> > again does not update i_disksize. If the journal gets committed and
> >> > then the block device is yanked, we might run into this.
> >> > 
> >> > To fix this, we now check in ext4_block_page_mkwrite whether
> >> > i_disksize needs to be updated to cover the newly allocated blocks.
> >> > 
> >> > Signed-off-by: Yongjian Sun <sunyongjian1@...wei.com>
> >
> > Hum, rather than complicating this niche code what if we just
> > unconditionally used ext4_get_block_unwritten() in
> > ext4_block_page_mkwrite() when delalloc gets disabled? It is far from any
> > performance critical path. What do people think? The code would actually
> > have to be something like:
> >
> > 	if (ext4_should_journal_data(inode))
> > 		get_block = ext4_get_block;
> > 	else
> > 		get_block = ext4_get_block_unwritten;
> >
> 
> The problem mainly was when e2fsck identify a written extent beyond
> i_disksize. But if it is unwritten extent, then we are still good. So
> using ext4_get_block_unwritten() by default in page fault path make
> sense.
> 
> So what about other checks like S_ISREG() and file with indirect blocks
> in ext4_should_dioread_nolock()? We still need ext4_get_block() for them right? 
> (Do we even fault on !S_ISREG() :) ?)

S_ISREG() is always true for page faults but I forgot about indirect block
based inodes, there we indeed cannot create unwritten extent. Drat, so
probably what Sun suggests is the easiest solution in the end. Going back
to his patch.

								Honza

-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ