[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20231130095522.zlgumj6zuqyygdzu@quack3>
Date: Thu, 30 Nov 2023 10:55:22 +0100
From: Jan Kara <jack@...e.cz>
To: Ritesh Harjani <ritesh.list@...il.com>
Cc: Jan Kara <jack@...e.cz>, Ted Tso <tytso@....edu>,
linux-ext4@...r.kernel.org,
syzbot+47479b71cdfc78f56d30@...kaller.appspotmail.com
Subject: Re: [PATCH] ext4: Fix warning in ext4_dio_write_end_io()
On Thu 23-11-23 15:17:28, Ritesh Harjani wrote:
> Jan Kara <jack@...e.cz> writes:
>
> > On Thu 23-11-23 12:37:03, Ritesh Harjani wrote:
> >> Jan Kara <jack@...e.cz> writes:
> >>
> >> > The syzbot has reported that it can hit the warning in
> >> > ext4_dio_write_end_io() because i_size < i_disksize. Indeed the
> >> > reproducer creates a race between DIO IO completion and truncate
> >> > expanding the file and thus ext4_dio_write_end_io() sees an inconsistent
> >> > inode state where i_disksize is already updated but i_size is not
> >> > updated yet. Since we are careful when setting up DIO write and consider
> >> > it extending (and thus performing the IO synchronously with i_rwsem held
> >> > exclusively) whenever it goes past either of i_size or i_disksize, we
> >> > can use the same test during IO completion without risking entering
> >> > ext4_handle_inode_extension() without i_rwsem held. This way we make it
> >> > obvious both i_size and i_disksize are large enough when we report DIO
> >> > completion without relying on unreliable WARN_ON.
> >>
> >> Does it make sense to add this in ext4_handle_inode_extension()?
> >> WARN_ON_ONCE(!inode_is_locked(inode));
> >> Ohk, we already have "lockdep_assert_held_write(&inode->i_rwsem)" so
> >> hopefully it can catch via lockdep.
> >
> > Exactly.
> >
> >> So, IIUC, the WARN happened when we were doing a non-extending
> >> AIO-DIO write which was racing with truncate trying to expand the file
> >> size. Because only then the DIO completion will not have i_rwsem held
> >> which can race with truncate. Truncate since it is expanding the file
> >> size, will not use inode_dio_wait() (since no block allocations).
> >>
> >> Is this understanding correct?
> >
> > Yes, correct.
>
> Thanks Jan,
>
> Also ext4_inode_extension_cleanup() function can take care of deleting
> the inode from the orphan list in case if there is a race with truncate
> which extended made both i_disksize and inode->i_size and the DIO
> completion couldn't call ext4_handle_inode_extension(), right?
>
> In that case, does it make sense to update a comment here too?
>
> @@ -350,7 +350,10 @@ static void ext4_inode_extension_cleanup(struct inode *inode, ssize_t count)
> }
> /*
> * If i_disksize got extended due to writeback of delalloc blocks while
> - * the DIO was running we could fail to cleanup the orphan list in
> + * the DIO was running, or
> + * If i_disksize and inode->i_size both got extened during truncate
> + * which raced with DIO completion,
> + * In both such cases, we could fail to cleanup the orphan list in
> * ext4_handle_inode_extension(). Do it now.
> */
> if (!list_empty(&EXT4_I(inode)->i_orphan) && inode->i_nlink) {
Good point. Expanded comment in this way. I'll send v2 shortly.
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists