linux-ext4 - Re: [PATCH v3 0/7] ext4: Add multi-fsblock atomic write support with bigalloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250514164050.GN25655@frogsfrogsfrogs>
Date: Wed, 14 May 2025 09:40:50 -0700
From: "Darrick J. Wong" <djwong@...nel.org>
To: Ritesh Harjani <ritesh.list@...il.com>
Cc: linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>,
	Jan Kara <jack@...e.cz>, John Garry <john.g.garry@...cle.com>,
	Ojaswin Mujoo <ojaswin@...ux.ibm.com>,
	linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v3 0/7] ext4: Add multi-fsblock atomic write support with
 bigalloc

On Fri, May 09, 2025 at 11:12:46PM +0530, Ritesh Harjani wrote:
> "Ritesh Harjani (IBM)" <ritesh.list@...il.com> writes:
> 
> > This is v3 of multi-fsblock atomic write support using bigalloc. This has
> > started looking into much better shape now. The major chunk of the design
> > changes has been kept in Patch-4 & 5.
> >
> > This series can now be carefully reviewed, as all the error handling related
> > code paths should be properly taken care of.
> >
> 
> We spotted that multi-fsblock changes might need to force a journal
> commit if there were mixed mappings in the underlying region e.g. say WUWUWUW...
> 
> The issue arises when, during block allocation, the unwritten ranges are
> first zeroed out, followed by the unwritten-to-written extent
> conversion. This conversion is part of a journaled metadata transaction
> that has not yet been committed, as the transaction is still running.
> If an iomap write then modifies the data on those multi-fsblocks and a
> sudden power loss occurs before the transaction commits, the
> unwritten-to-written conversion will not be replayed during journal
> recovery. As a result, we end up with new data written over mapped
> blocks, while the alternate unwritten blocks will read zeroes. This
> could cause a torn write behavior for atomic writes.
> 
> So we were thinking we might need something like this. Hopefully this
> should still be ok, as mixed mapping case mostly is a non-performance
> critical path. Thoughts?

I agree the journal has to be written out before the atomic write is
sent to the device.

--D

> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2642e1ef128f..59b59d609976 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3517,7 +3517,8 @@ static int ext4_map_blocks_atomic_write_slow(handle_t *handle,
>   * underlying short holes/unwritten extents within the requested range.
>   */
>  static int ext4_map_blocks_atomic_write(handle_t *handle, struct inode *inode,
> -                               struct ext4_map_blocks *map, int m_flags)
> +                               struct ext4_map_blocks *map, int m_flags,
> +                               bool *force_commit)
>  {
>         ext4_lblk_t m_lblk = map->m_lblk;
>         unsigned int m_len = map->m_len;
> @@ -3537,6 +3538,11 @@ static int ext4_map_blocks_atomic_write(handle_t *handle, struct inode *inode,
>         map->m_len = m_len;
>         map->m_flags = 0;
> 
> +       /*
> +        * slow path means we have mixed mapping, that means we will need
> +        * to force txn commit.
> +        */
> +       *force_commit = true;
>         return ext4_map_blocks_atomic_write_slow(handle, inode, map);
>  out:
>         return ret;
> @@ -3548,6 +3554,7 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map,
>         handle_t *handle;
>         u8 blkbits = inode->i_blkbits;
>         int ret, dio_credits, m_flags = 0, retries = 0;
> +       bool force_commit = false;
> 
>         /*
>          * Trim the mapping request to the maximum value that we can map at
> @@ -3610,7 +3617,8 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map,
>                 m_flags = EXT4_GET_BLOCKS_IO_CREATE_EXT;
> 
>         if (flags & IOMAP_ATOMIC)
> -               ret = ext4_map_blocks_atomic_write(handle, inode, map, m_flags);
> +               ret = ext4_map_blocks_atomic_write(handle, inode, map, m_flags,
> +                                                  &force_commit);
>         else
>                 ret = ext4_map_blocks(handle, inode, map, m_flags);
> 
> @@ -3626,6 +3634,9 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map,
>         if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
>                 goto retry;
> 
> +       if (ret > 0 && force_commit)
> +               ext4_force_commit(inode->i_sb);
> +
>         return ret;
>  }
> 
> 
> -ritesh
>