linux-kernel - Re: [RFC 0/5] ext4: Implement support for extsize hints

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <72f78079-986a-45ac-8f5c-489dd824615e@oracle.com>
Date: Fri, 13 Sep 2024 14:34:35 +0100
From: John Garry <john.g.garry@...cle.com>
To: "Ritesh Harjani (IBM)" <ritesh.list@...il.com>,
        Ojaswin Mujoo <ojaswin@...ux.ibm.com>
Cc: linux-kernel@...r.kernel.org, "Darrick J . Wong" <djwong@...nel.org>,
        linux-fsdevel@...r.kernel.org, dchinner@...hat.com,
        linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>
Subject: Re: [RFC 0/5] ext4: Implement support for extsize hints

On 13/09/2024 11:54, Ritesh Harjani (IBM) wrote:
> John Garry <john.g.garry@...cle.com> writes:
> 
>> On 11/09/2024 10:01, Ojaswin Mujoo wrote:
>>> This patchset implements extsize hint feature for ext4. Posting this RFC to get
>>> some early review comments on the design and implementation bits. This feature
>>> is similar to what we have in XFS too with some differences.
>>>
>>> extsize on ext4 is a hint to mballoc (multi-block allocator) and extent
>>> handling layer to do aligned allocations. We use allocation criteria 0
>>> (CR_POWER2_ALIGNED) for doing aligned power-of-2 allocations. With extsize hint
>>> we try to align the logical start (m_lblk) and length(m_len) of the allocation
>>> to be extsize aligned. CR_POWER2_ALIGNED criteria in mballoc automatically make
>>> sure that we get the aligned physical start (m_pblk) as well. So in this way
>>> extsize can make sure that lblk, len and pblk all are aligned for the allocated
>>> extent w.r.t extsize.
>>>
>>> Note that extsize feature is just a hinting mechanism to ext4 multi-block
>>> allocator. That means that if we are unable to get an aligned allocation for
>>> some reason, than we drop this flag and continue with unaligned allocation to
>>> serve the request. However when we will add atomic/untorn writes support, then
>>> we will enforce the aligned allocation and can return -ENOSPC if aligned
>>> allocation was not successful.
>>
>> A few questions/confirmations:
>> - You have no intention of adding an equivalent of forcealign, right?
> 
> extsize is just a hinting mechanism that too only for __allocation__
> path. But for atomic writes we do require some form of forcealign (like
> how we have in XFS). So we could either call this directly as atomic
> write feature or can may as well call this forcealign feature and make
> atomic writes depend upon it, like how XFS is doing it.
> 
> I still haven't understood if there is/will be a user specifically for
> forcealign other than atomic writes.
 > > Since you asked, I am more curious to know if there is some more 
context
> to your question?

As Darrick mentioned at the following, forcealign could be used for DAX:
https://lore.kernel.org/linux-xfs/170404855884.1770028.10371509002317647981.stgit@frogsfrogsfrogs/

> 
>>
>> - Would you also plan on using FS_IOC_FS(GET/SET)XATTR interface for
>> enabling atomic writes on a per-inode basis?
> 
> Yes, that interface should indeed be kept same for EXT4 too.
> 
>>
>> - Can extsize be set at mkfs time?
> 
> Good point. For now in this series, extsize can only be set using the
> same ioctl on a per inode basis.
> 
> IIUC, XFS supports doing both right. We can do this on a per-inode basis
> during ioctl or it also supports setting this during mkfs.xfs time.

Right

> (maybe xfsprogs only allows setting this at mkfs time for rtvolumes for now)

extsize hint can already be set at mkfs time for both rtvol and !rtvol 
today.

> 
> So if this is set during mkfs.xfs time and then by default all inodes will
> have this extsize attribute value set right?

Right

But there is still the option to set this later with xfs_io -c "extsize" 
per-inode.

> 
> BTW, this brings me to another question that I had asked here too [1].
> 1. For XFS, atomic writes can only be enabled with a fresh mkfs.xfs -d
> atomic-writes=1 right?

Correct

Setting atomic-writes=1 enables the feature in the SB

> 2. For atomic writes to be enabled, we need all 3 features to be
> enabled during mkfs.xfs time itself right?

Right, that is how it is currently done.  But you could easily set 
extsize=4K at mkfs time so that not all inodes have a 16KB extsize, as 
in the example below. In this case, certain atomic write inodes could 
have their extsize increased to 16KB.

> i.e.
> "mkfs.xfs -i forcealign=1 -d extsize=16384 -d atomic-writes=1"
> 
> [1]: https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20240817094800.776408-1-john.g.garry@oracle.com/__;!!ACWV5N9M2RV99hQ!J0dwKULbs9neFPRiUN1VR63Ea-Qgjk77y6SFN4GPBN2zqIGP46CDH0vG6fpvEMDFCq-O05CMePOn70hy9FA3zlw$
> 
>>
>> - Is there any userspace support for this series available?
> 
> Make sense to maybe provide a userspace support link too.
> For now, a quick hack would be to just allow setting extsize hint for
> other fileystems as well in xfs_io.
> 
> diff --git a/io/open.c b/io/open.c
> index 15850b55..6407b7e8 100644
> --- a/io/open.c
> +++ b/io/open.c
> @@ -980,7 +980,7 @@ open_init(void)
>          extsize_cmd.args = _("[-D | -R] [extsize]");
>          extsize_cmd.argmin = 0;
>          extsize_cmd.argmax = -1;
> -       extsize_cmd.flags = CMD_NOMAP_OK;
> +       extsize_cmd.flags = CMD_NOMAP_OK | CMD_FOREIGN_OK;
>          extsize_cmd.oneline =
>                  _("get/set preferred extent size (in bytes) for the open file");
>          extsize_cmd.help = extsize_help;
> 
> <e.g>
> /dev/loop6 on /mnt1/test type ext4 (rw,relatime)
> 
> root@...u:~/xt/xfsprogs-dev# ./io/xfs_io -fc "extsize" /mnt1/test/f1
> [0] /mnt1/test/f1
> root@...u:~/xt/xfsprogs-dev# ./io/xfs_io -c "extsize 16384" /mnt1/test/f1
> root@...u:~/xt/xfsprogs-dev# ./io/xfs_io -c "extsize" /mnt1/test/f1
> [16384] /mnt1/test/f1

ok

> 
> 
>>
>> - how would/could extsize interact with bigalloc?
>>
> 
> As of now it is kept disabled with bigalloc.
> 
> +	if (sbi->s_cluster_ratio > 1) {
> +		msg = "Can't use extsize hint with bigalloc";
> +		err = -EINVAL;
> +		goto error;
> +	}
> 
> 
>>>
>>> Comparison with XFS extsize feature -
>>> =====================================
>>> 1. extsize in XFS is a hint for aligning only the logical start and the lengh
>>>      of the allocation v/s extsize on ext4 make sure the physical start of the
>>>      extent gets aligned as well.
>>
>> note that forcealign with extsize aligns AG block also
> 
> Can you expand that on a bit. You mean during mkfs.xfs time we ensure
> agblock boundaries are extsize aligned?

Yes, see align_ag_geometry() at 
https://github.com/johnpgarry/xfsprogs-dev/commits/atomic-writes/

> 
>>
>> only for atomic writes do we enforce the AG block is aligned to physical
>> block
>>
> 
> If you could expand that a bit please? You meant during mkfs.xfs
> time for atomic writes we ensure ag block start bounaries are extsize aligned?

We do this for forcealign with the extsize value supplied at mkfs time.

There are 2x things to consider about this:
- mkfs-specified extsize need not necessarily be a power-of-2
- even if this mkfs-specified extsize is a power-of-2, attempting to 
increase extsize for an inode enabled for atomic writes may be 
restricted, as the new extsize may not align with the AG count.

For example, extsize was 64KB and AG count = 16400 FSB (1025 * 64KB), 
then we cannot enable an inode for atomic writes with extsize = 128KB, 
as the disk block would not be aligned with the AG block.

> 
> 
>>>
>>> 2. eof allocation on XFS trims the blocks allocated beyond eof with extsize
>>>      hint. That means on XFS for eof allocations (with extsize hint) only logical
>>>      start gets aligned. However extsize hint in ext4 for eof allocation is not
>>>      supported in this version of the series.
>>>
>>> 3. XFS allows extsize to be set on file with no extents but delayed data.
>>>      However, ext4 don't allow that for simplicity. The user is expected to set
>>>      it on a file before changing it's i_size.
>>>
>>> 4. XFS allows non-power-of-2 values for extsize but ext4 does not, since we
>>>      primarily would like to support atomic writes with extsize.
>>>
>>> 5. In ext4 we chose to store the extsize value in SYSTEM_XATTR rather than an
>>>      inode field as it was simple and most flexible, since there might be more
>>>      features like atomic/untorn writes coming in future.
>>>
>>> 6. In buffered-io path XFS switches to non-delalloc allocations for extsize hint.
>>>      The same has been kept for EXT4 as well.
>>>
>>> Some TODOs:
>>> ===========
>>> 1. EOF allocations support can be added and can be kept similar to XFS
>>
>> Note that EOF alignment for forcealign may change - it needs to be
>> discussed further.
> 
> Sure, thanks for pointing that out.
> I guess you are referring to mainly the truncate related EOF alignment change
> required with forcealign for XFS.
> 

Thanks,
John