linux-kernel - Re: [PATCH 10/21] block: Add fops atomic write support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZR3gXHfIpn3eybh0@dread.disaster.area>
Date:   Thu, 5 Oct 2023 08:59:56 +1100
From:   Dave Chinner <david@...morbit.com>
To:     Bart Van Assche <bvanassche@....org>
Cc:     John Garry <john.g.garry@...cle.com>, axboe@...nel.dk,
        kbusch@...nel.org, hch@....de, sagi@...mberg.me,
        jejb@...ux.ibm.com, martin.petersen@...cle.com, djwong@...nel.org,
        viro@...iv.linux.org.uk, brauner@...nel.org,
        chandan.babu@...cle.com, dchinner@...hat.com,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-nvme@...ts.infradead.org, linux-xfs@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, tytso@....edu, jbongio@...gle.com,
        linux-api@...r.kernel.org
Subject: Re: [PATCH 10/21] block: Add fops atomic write support

On Wed, Oct 04, 2023 at 10:34:13AM -0700, Bart Van Assche wrote:
> On 10/4/23 02:14, John Garry wrote:
> > On 03/10/2023 17:45, Bart Van Assche wrote:
> > > On 10/3/23 01:37, John Garry wrote:
> > > > I don't think that is_power_of_2(write length) is specific to XFS.
> > > 
> > > I think this is specific to XFS. Can you show me the F2FS code that
> > > restricts the length of an atomic write to a power of two? I haven't
> > > found it. The only power-of-two check that I found in F2FS is the
> > > following (maybe I overlooked something):
> > > 
> > > $ git grep -nH is_power fs/f2fs
> > > fs/f2fs/super.c:3914:    if (!is_power_of_2(zone_sectors)) {
> > 
> > Any usecases which we know of requires a power-of-2 block size.
> > 
> > Do you know of a requirement for other sizes? Or are you concerned that
> > it is unnecessarily restrictive?
> > 
> > We have to deal with HW features like atomic write boundary and FS
> > restrictions like extent and stripe alignment transparent, which are
> > almost always powers-of-2, so naturally we would want to work with
> > powers-of-2 for atomic write sizes.
> > 
> > The power-of-2 stuff could be dropped if that is what people want.
> > However we still want to provide a set of rules to the user to make
> > those HW and FS features mentioned transparent to the user.
> 
> Hi John,
> 
> My concern is that the power-of-2 requirements are only needed for
> traditional filesystems and not for log-structured filesystems (BTRFS,
> F2FS, BCACHEFS).

Filesystems that support copy-on-write data (needed for arbitrary
filesystem block aligned RWF_ATOMIC support) are not necessarily log
structured. For example: XFS.

All three of the filesystems you list above still use power-of-2
block sizes for most of their metadata structures and for large data
extents. Hence once you go above a certain file size they are going
to be doing full power-of-2 block size aligned IO anyway. hence the
constraint of atomic writes needing to be power-of-2 block size
aligned to avoid RMW cycles doesn't really change for these
filesystems.

In which case, they can just set their minimum atomic IO size to be
the same as their block size (e.g. 4kB) and set the maximum to
something they can guarantee gets COW'd in a single atomic
transaction. What the hardware can do with REQ_ATOMIC IO is
completely irrelevant at this point....

> What I'd like to see is that each filesystem declares its atomic write
> requirements (in struct address_space_operations?) and that
> blkdev_atomic_write_valid() checks the filesystem-specific atomic write
> requirements.

That seems unworkable to me - IO constraints propagate from the
bottom up, not from the top down.

Consider multi-device filesystems (btrfs and XFS), where different
devices might have different atomic write parameters.  Which
set of bdev parameters does the filesystem report to the querying
bdev?  (And doesn't that question just sound completely wrong?)

It also doesn't work for filesystems that can configure extent
allocation alignment at an individual inode level (like XFS) - what
does the filesystem report to the device when it doesn't know what
alignment constraints individual on-disk inodes might be using?

That's why statx() vectors through filesystems to all them to set
their own parameters based on the inode statx() is being called on.
If the filesystem has a native RWF_ATOMIC implementation, it can put
it's own parameters in the statx min/max atomic write size fields.
If the fs doesn't have it's own native support, but can do physical
file offset/LBA alignment, then it publishes the block device atomic
support parameters or overrides them with it's internal allocation
alignment constraints. If the bdev doesn't support REQ_ATOMIC, the
filesystem says "atomic writes are not supported".

-Dave.
-- 
Dave Chinner
david@...morbit.com