[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ixlyfqaobk4whctod5wwhusqeeduqxamni6zkxl2wdlbtcyms2@intsywwjfv25>
Date: Mon, 26 May 2025 15:43:31 +0200
From: Jan Kara <jack@...e.cz>
To: Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
Cc: Theodore Ts'o <tytso@....edu>, Jan Kara <jack@...e.com>,
Tao Ma <boyu.mt@...bao.com>, Andreas Dilger <adilger.kernel@...ger.ca>,
Eric Biggers <ebiggers@...gle.com>, linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, kernel-dev@...lia.com,
syzbot+0c89d865531d053abb2d@...kaller.appspotmail.com, stable@...r.kernel.org
Subject: Re: [PATCH] ext4: inline: do not convert when writing to memory map
On Wed 21-05-25 18:52:03, Thadeu Lima de Souza Cascardo wrote:
> On Tue, May 20, 2025 at 10:57:08AM -0400, Theodore Ts'o wrote:
> > On Mon, May 19, 2025 at 07:42:46AM -0300, Thadeu Lima de Souza Cascardo wrote:
> > > inline data handling has a race between writing and writing to a memory
> > > map.
> > >
> > > When ext4_page_mkwrite is called, it calls ext4_convert_inline_data, which
> > > destroys the inline data, but if block allocation fails, restores the
> > > inline data. In that process, we could have:
> > >
> > > CPU1 CPU2
> > > destroy_inline_data
> > > write_begin (does not see inline data)
> > > restory_inline_data
> > > write_end (sees inline data)
> > >
> > > The conversion inside ext4_page_mkwrite was introduced at commit
> > > 7b4cc9787fe3 ("ext4: evict inline data when writing to memory map"). This
> > > fixes a documented bug in the commit message, which suggests some
> > > alternatives fixes.
> >
> > Your fix just reverts commit 7b4cc9787fe3, and removes the BUG_ON.
> > While this is great for shutting up the syzbot report, but it causes
> > file writes to an inline data file via a mmap to never get written
> > back to the storage device. So you are replacing BUG_ON that can get
> > triggered on a race condition in case of a failed block allocation,
> > with silent data corruption. This is not an improvement.
> >
> > Thanks for trying to address this, but I'm not going to accept your
> > proposed fix.
> >
> > - Ted
>
> Hi, Ted.
>
> I am trying to understand better the circumstances where the data loss
> might occur with the fix, but might not occur without the fix. Or, even if
> they occur either way, such that I can work on a better/proper fix.
>
> Right now, if ext4_convert_inline_data (called from ext4_page_mkwrite)
> fails with ENOSPC, the memory access will lead to a SIGBUS. The same will
> happen without the fix, if there are no blocks available.
>
> Now, without ext4_convert_inline_data, blocks will be allocated by
> ext4_page_mkwrite and written by ext4_do_writepages. Are you concerned
> about a failure between the clearing of the inode data and the writing of
> the block in ext4_do_writepages?
>
> Or are you concerned about a potential race condition when allocating
> blocks?
>
> Which of these cannot happen today with the code as is? If I understand
> correctly, the inline conversion code also calls ext4_destroy_inline_data
> before allocating and writing to blocks.
>
> Thanks a lot for the review and guidance.
So I'm not sure what Ted was exactly worried about because writeback code
should normally allocate underlying blocks for writeout of the mmaped page
AFAICT. But the problem I can see is that clearing
EXT4_STATE_MAY_INLINE_DATA requires i_rwsem held as otherwise we may be
racing with e.g. write(2) and switching EXT4_STATE_MAY_INLINE_DATA in the
middle of the write will cause bad things (inconsistency between how
write_begin() and write_end() callbacks behave).
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists