linux-kernel - Re: [PATCH v3 08/21] xfs: Introduce FORCEALIGN inode flag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <41612184-0804-47d6-8712-3dd99270a665@oracle.com>
Date: Thu, 2 May 2024 08:56:49 +0100
From: John Garry <john.g.garry@...cle.com>
To: Dave Chinner <david@...morbit.com>
Cc: djwong@...nel.org, hch@....de, viro@...iv.linux.org.uk, brauner@...nel.org,
        jack@...e.cz, chandan.babu@...cle.com, willy@...radead.org,
        axboe@...nel.dk, martin.petersen@...cle.com,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        tytso@....edu, jbongio@...gle.com, ojaswin@...ux.ibm.com,
        ritesh.list@...il.com, mcgrof@...nel.org, p.raghav@...sung.com,
        linux-xfs@...r.kernel.org, catherine.hoang@...cle.com
Subject: Re: [PATCH v3 08/21] xfs: Introduce FORCEALIGN inode flag

On 02/05/2024 01:50, Dave Chinner wrote:
>> For example, consider xfs_file_dio_write(), where we check for an unaligned
>> write based on forcealign extent mask. It's much simpler to rely on a
>> power-of-2 size. And same for iomap extent zeroing.
> But it's not more complex - we already do this non-power-of-2
> alignment stuff for all the realtime code, so it's just a matter
> of not blindly using bit masking in alignment checks.
> 
>> So then it can be asked, for what reason do we want to support unorthodox,
>> non-power-of-2 sizes? Who would want this?
> I'm constantly surprised by the way people use stuff like this
> filesystem and storage alignment constraints are not arbitrarily
> limited to power-of-2 sizes.
> 
> For example, code implementation is simple in RAID setups when you
> use power-of-2 chunk sizes and stripe widths. But not all storage
> hardware fits power-of-2 configs like 4+1, 4+2, 8+1, 8+2, etc. THis
> is pretty common - 2.5" 2U drive trays have 24 drive bays. If you
> want to give up 33% of the storage capacity just to use power-of-2
> stripe widths then you would use 4x4+2 RAID6 luns. However, most
> people don't want to waste that much money on redundancy. They are
> much more likely to use 2x10+2 RAID6 luns or 1x21+2 with a hot spare
> to maximise the data storage capacity.

Thanks for sharing this info

> 
> If someone wants to force-align allocation to stripe widths on such
> a RAID array config rather than trying to rely on the best effort
> swalloc mount option, then they need non-power-of-2
> alignments to be supported.
> 
> It's pretty much a no-brainer - the alignment code already handles
> non-power-of-2 alignments, and it's not very much additional code to
> ensure we can handle any alignment the user specified.

ok, fine

> 
>> As for AG size, again I think that it is required to be aligned to the
>> forcealign extsize. As I remember, when converting from an FSB to a DB, if
>> the AG itself is not aligned to the forcealign extsize, then the DB will not
>> be aligned to the forcealign extsize. More below...
>>
>>>> +	/* Requires agsize be a multiple of extsize */
>>>> +	if (mp->m_sb.sb_agblocks % extsize)
>>>> +		return __this_address;
>>>> +
>>>> +	/* Requires stripe unit+width (if set) be a multiple of extsize */
>>>> +	if ((mp->m_dalign && (mp->m_dalign % extsize)) ||
>>>> +	    (mp->m_swidth && (mp->m_swidth % extsize)))
>>>> +		return __this_address;
>>> Again, this is an atomic write constraint, isn't it?
>> So why do we want forcealign? It is to only align extent FSBs?
> Yes. forced alignment is essentially just extent size guarantees.
> 
> This is part of what is needed for atomic writes, but atomic writes
> also require specific physical storage alignment between the
> filesystem and the device. The filesystem setup has to correctly
> align AGs to the physical storage, and stuff like RAID
> configurations need to be specifically compatible with the atomic
> write capabilities of the underlying hardware.
> 
> None of these hardware iand storage stack alignment constraints have
> any relevance to the filesystem forced alignment functionality. They
> are completely indepedent. All the forced alignment does is
> guarantees that allocation is aligned according the extent size hint
> on the inode or it fails with ENOSPC.

Fine, so only for atomic writes we just need to ensure FSBs are aligned 
to DBs.

And so it is the responsibility of mkfs to ensure AG size aligns to any 
forcealign extsize specified and also disk atomic write geometry.

For atomic write only, it is the responsibility of the kernel to check 
the forcealign extsize is compatible with any stripe alignment and AG size.

>>>
>>> Can you please separate these and put all the force align user API
>>> validation checks in the one function?
>>>
>> ok, fine. But it would be good to have clarification on function of
>> forcealign, above, i.e. does it always align extents to disk blocks?
> No, it doesn't. XFS has never done this - physical extent alignment
> is always done relative to the start of the AG, not the underlying
> disk geometry.
> 
> IOWs, forced alignement is not aligning to disk blocks at all - it
> is aligning extents logically to file offset and physically to the
> offset from the start of the allocation group.  Hence there are no
> real constraints on forced alignment - we can do any sort of
> alignment as long it is smaller than half the max size of a physical
> extent.
> 
> For allocation to then be aligned to physical storage, we need mkfs
> to physically align the start of each AG to the geometry of the
> underlying storage. We already do this for filesystems with a stripe
> unit defined, hence stripe aligned allocation is physically aligned
> to the underlying storage.

Sure

> 
> However, if mkfs doesn't get the physical layout of AGs right, there
> is nothing the mounted filesystem can do to guarantee extent
> allocation is aligned to physical disk blocks regardless of whether
> forced alignment is enabled or not...

ok, understood.

Thanks,
John