[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250514164050.GN25655@frogsfrogsfrogs>
Date: Wed, 14 May 2025 09:40:50 -0700
From: "Darrick J. Wong" <djwong@...nel.org>
To: Ritesh Harjani <ritesh.list@...il.com>
Cc: linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>,
Jan Kara <jack@...e.cz>, John Garry <john.g.garry@...cle.com>,
Ojaswin Mujoo <ojaswin@...ux.ibm.com>,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v3 0/7] ext4: Add multi-fsblock atomic write support with
bigalloc
On Fri, May 09, 2025 at 11:12:46PM +0530, Ritesh Harjani wrote:
> "Ritesh Harjani (IBM)" <ritesh.list@...il.com> writes:
>
> > This is v3 of multi-fsblock atomic write support using bigalloc. This has
> > started looking into much better shape now. The major chunk of the design
> > changes has been kept in Patch-4 & 5.
> >
> > This series can now be carefully reviewed, as all the error handling related
> > code paths should be properly taken care of.
> >
>
> We spotted that multi-fsblock changes might need to force a journal
> commit if there were mixed mappings in the underlying region e.g. say WUWUWUW...
>
> The issue arises when, during block allocation, the unwritten ranges are
> first zeroed out, followed by the unwritten-to-written extent
> conversion. This conversion is part of a journaled metadata transaction
> that has not yet been committed, as the transaction is still running.
> If an iomap write then modifies the data on those multi-fsblocks and a
> sudden power loss occurs before the transaction commits, the
> unwritten-to-written conversion will not be replayed during journal
> recovery. As a result, we end up with new data written over mapped
> blocks, while the alternate unwritten blocks will read zeroes. This
> could cause a torn write behavior for atomic writes.
>
> So we were thinking we might need something like this. Hopefully this
> should still be ok, as mixed mapping case mostly is a non-performance
> critical path. Thoughts?
I agree the journal has to be written out before the atomic write is
sent to the device.
--D
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2642e1ef128f..59b59d609976 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3517,7 +3517,8 @@ static int ext4_map_blocks_atomic_write_slow(handle_t *handle,
> * underlying short holes/unwritten extents within the requested range.
> */
> static int ext4_map_blocks_atomic_write(handle_t *handle, struct inode *inode,
> - struct ext4_map_blocks *map, int m_flags)
> + struct ext4_map_blocks *map, int m_flags,
> + bool *force_commit)
> {
> ext4_lblk_t m_lblk = map->m_lblk;
> unsigned int m_len = map->m_len;
> @@ -3537,6 +3538,11 @@ static int ext4_map_blocks_atomic_write(handle_t *handle, struct inode *inode,
> map->m_len = m_len;
> map->m_flags = 0;
>
> + /*
> + * slow path means we have mixed mapping, that means we will need
> + * to force txn commit.
> + */
> + *force_commit = true;
> return ext4_map_blocks_atomic_write_slow(handle, inode, map);
> out:
> return ret;
> @@ -3548,6 +3554,7 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map,
> handle_t *handle;
> u8 blkbits = inode->i_blkbits;
> int ret, dio_credits, m_flags = 0, retries = 0;
> + bool force_commit = false;
>
> /*
> * Trim the mapping request to the maximum value that we can map at
> @@ -3610,7 +3617,8 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map,
> m_flags = EXT4_GET_BLOCKS_IO_CREATE_EXT;
>
> if (flags & IOMAP_ATOMIC)
> - ret = ext4_map_blocks_atomic_write(handle, inode, map, m_flags);
> + ret = ext4_map_blocks_atomic_write(handle, inode, map, m_flags,
> + &force_commit);
> else
> ret = ext4_map_blocks(handle, inode, map, m_flags);
>
> @@ -3626,6 +3634,9 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map,
> if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
> goto retry;
>
> + if (ret > 0 && force_commit)
> + ext4_force_commit(inode->i_sb);
> +
> return ret;
> }
>
>
> -ritesh
>
Powered by blists - more mailing lists