[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <35939b19-088b-450e-8fa6-49165b95b1d3@oracle.com>
Date: Wed, 29 Jan 2025 08:59:15 +0000
From: John Garry <john.g.garry@...cle.com>
To: Ojaswin Mujoo <ojaswin@...ux.ibm.com>, lsf-pc@...ts.linux-foundation.org
Cc: linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
djwong@...nel.org, dchinner@...hat.com, hch@....de,
ritesh.list@...il.com, jack@...e.cz, tytso@....edu,
linux-ext4@...r.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] extsize and forcealign design in filesystems
for atomic writes
On 29/01/2025 07:06, Ojaswin Mujoo wrote:
Hi Ojaswin,
>
> I would like to submit a proposal to discuss the design of extsize and
> forcealign and various open questions around it.
>
> ** Background **
>
> Modern NVMe/SCSI disks with atomic write capabilities can allow writes to a
> multi-KB range on disk to go atomically. This feature has a wide variety of use
> cases especially for databases like mysql and postgres that can leverage atomic
> writes to gain significant performance. However, in order to enable atomic
> writes on Linux, the underlying disk may have some size and alignment
> constraints that the upper layers like filesystems should follow. extsize with
> forcealign is one of the ways filesystems can make sure the IO submitted to the
> disk adheres to the atomic writes constraints.
>
> extsize is a hint to the FS to allocate extents at a certian logical alignment
> and size. forcealign builds on this by forcing the allocator to enforce the
> alignment guarantees for physical blocks as well, which is essential for atomic
> writes.
>
> ** Points of discussion **
>
> Extsize hints feature is already supported by XFS [1] with forcealign still
> under development and discussion [2].
From
https://lore.kernel.org/linux-xfs/20241212013433.GC6678@frogsfrogsfrogs/
thread, the alternate solution to forcealign for XFS is to use a
software-emulated fallback for unaligned atomic writes. I am looking at
a PoC implementation now. Note that this does rely on CoW.
There has been push back on forcealign for XFS, so we need to
prove/disprove that this software-emulated fallback can work, see
https://lore.kernel.org/linux-xfs/20240924061719.GA11211@lst.de/
> After taking a look at ext4's multi-block
> allocator design, supporting extsize with forcealign can be done in ext4 as
> well. There is a RFC proposed which adds support for extsize hints feature in
> ext4 [3]. However there are some caveats and deviations from XFS design. With
> these in mind, I would like to propose LSFMM topic on:
>
> * exact semantics of extsize w/ forcealign which can bring a consistent
> interface among ext4 and xfs and possibly any other FS that plans to
> implement them in the future.
>
> * Documenting how forcealign with extsize should behave with various FS
> operations like fallocate, truncate, punch hole, insert/collapse range etcÂ
>
> * Implementing extsize with delayed allocation and the challenges there.
>
> * Discussing tooling support of forcealign like how are we planning to maintain
> block alignment gurantees during fsck, resize and other times where we might
> need to move blocks around?
>
> * Documenting any areas where FSes might differ in their implementations of the
> same. Example, ext4 doesn't plan to support non power of 2 extsizes whereas
> XFS has support for that.
>
> Hopefully this discussion will be relevant in defining consistent semantics for
> extsize hints and forcealign which might as well come useful for other FS
> developers too.
>
> Thoughts and suggestions are welcome.
>
> References:
> [1] https://urldefense.com/v3/__https://man7.org/linux/man-pages/man2/ioctl_xfs_fsgetxattr.2.html__;!!ACWV5N9M2RV99hQ!NoUXCJI_ofztyeV6aq2HvNI4YHcyjSHvzxHkw0fSGB9_SKz6jkAqzBVy7WcUSNNHrJl0jM0qolbvuVK2oQKuYw$
> [2] https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20240813163638.3751939-1-john.g.garry@oracle.com/__;!!ACWV5N9M2RV99hQ!NoUXCJI_ofztyeV6aq2HvNI4YHcyjSHvzxHkw0fSGB9_SKz6jkAqzBVy7WcUSNNHrJl0jM0qolbvuVLgqkSeIg$
> [3] https://urldefense.com/v3/__https://lore.kernel.org/linux-ext4/cover.1733901374.git.ojaswin@linux.ibm.com/__;!!ACWV5N9M2RV99hQ!NoUXCJI_ofztyeV6aq2HvNI4YHcyjSHvzxHkw0fSGB9_SKz6jkAqzBVy7WcUSNNHrJl0jM0qolbvuVJ_GK50Cg$
>
> Regards,
> ojaswin
Powered by blists - more mailing lists