lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ixlyfqaobk4whctod5wwhusqeeduqxamni6zkxl2wdlbtcyms2@intsywwjfv25>
Date: Mon, 26 May 2025 15:43:31 +0200
From: Jan Kara <jack@...e.cz>
To: Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
Cc: Theodore Ts'o <tytso@....edu>, Jan Kara <jack@...e.com>, 
	Tao Ma <boyu.mt@...bao.com>, Andreas Dilger <adilger.kernel@...ger.ca>, 
	Eric Biggers <ebiggers@...gle.com>, linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, kernel-dev@...lia.com, 
	syzbot+0c89d865531d053abb2d@...kaller.appspotmail.com, stable@...r.kernel.org
Subject: Re: [PATCH] ext4: inline: do not convert when writing to memory map

On Wed 21-05-25 18:52:03, Thadeu Lima de Souza Cascardo wrote:
> On Tue, May 20, 2025 at 10:57:08AM -0400, Theodore Ts'o wrote:
> > On Mon, May 19, 2025 at 07:42:46AM -0300, Thadeu Lima de Souza Cascardo wrote:
> > > inline data handling has a race between writing and writing to a memory
> > > map.
> > > 
> > > When ext4_page_mkwrite is called, it calls ext4_convert_inline_data, which
> > > destroys the inline data, but if block allocation fails, restores the
> > > inline data. In that process, we could have:
> > > 
> > > CPU1					CPU2
> > > destroy_inline_data
> > > 					write_begin (does not see inline data)
> > > restory_inline_data
> > > 					write_end (sees inline data)
> > > 
> > > The conversion inside ext4_page_mkwrite was introduced at commit
> > > 7b4cc9787fe3 ("ext4: evict inline data when writing to memory map"). This
> > > fixes a documented bug in the commit message, which suggests some
> > > alternatives fixes.
> > 
> > Your fix just reverts commit 7b4cc9787fe3, and removes the BUG_ON.
> > While this is great for shutting up the syzbot report, but it causes
> > file writes to an inline data file via a mmap to never get written
> > back to the storage device.  So you are replacing BUG_ON that can get
> > triggered on a race condition in case of a failed block allocation,
> > with silent data corruption.   This is not an improvement.
> > 
> > Thanks for trying to address this, but I'm not going to accept your
> > proposed fix.
> > 
> >      	    	 	       	       - Ted
> 
> Hi, Ted.
> 
> I am trying to understand better the circumstances where the data loss
> might occur with the fix, but might not occur without the fix. Or, even if
> they occur either way, such that I can work on a better/proper fix.
> 
> Right now, if ext4_convert_inline_data (called from ext4_page_mkwrite)
> fails with ENOSPC, the memory access will lead to a SIGBUS. The same will
> happen without the fix, if there are no blocks available.
> 
> Now, without ext4_convert_inline_data, blocks will be allocated by
> ext4_page_mkwrite and written by ext4_do_writepages. Are you concerned
> about a failure between the clearing of the inode data and the writing of
> the block in ext4_do_writepages?
> 
> Or are you concerned about a potential race condition when allocating
> blocks?
> 
> Which of these cannot happen today with the code as is? If I understand
> correctly, the inline conversion code also calls ext4_destroy_inline_data
> before allocating and writing to blocks.
> 
> Thanks a lot for the review and guidance.

So I'm not sure what Ted was exactly worried about because writeback code
should normally allocate underlying blocks for writeout of the mmaped page
AFAICT. But the problem I can see is that clearing
EXT4_STATE_MAY_INLINE_DATA requires i_rwsem held as otherwise we may be
racing with e.g. write(2) and switching EXT4_STATE_MAY_INLINE_DATA in the
middle of the write will cause bad things (inconsistency between how
write_begin() and write_end() callbacks behave).

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ