lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2aced048-4d4b-4a48-9a45-049f73763697@oracle.com>
Date:   Mon, 4 Dec 2023 09:02:56 +0000
From:   John Garry <john.g.garry@...cle.com>
To:     Dave Chinner <david@...morbit.com>
Cc:     Ojaswin Mujoo <ojaswin@...ux.ibm.com>, linux-ext4@...r.kernel.org,
        Theodore Ts'o <tytso@....edu>,
        Ritesh Harjani <ritesh.list@...il.com>,
        linux-kernel@...r.kernel.org,
        "Darrick J . Wong" <djwong@...nel.org>,
        linux-block@...r.kernel.org, linux-xfs@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, dchinner@...hat.com
Subject: Re: [RFC 1/7] iomap: Don't fall back to buffered write if the write
 is atomic

On 01/12/2023 22:07, Dave Chinner wrote:
>> Sure, and I think that we need a better story for supporting buffered IO for
>> atomic writes.
>>
>> Currently we have:
>> - man pages tell us RWF_ATOMIC is only supported for direct IO
>> - statx gives atomic write unit min/max, not explicitly telling us it's for
>> direct IO
>> - RWF_ATOMIC is ignored for !O_DIRECT
>>
>> So I am thinking of expanding statx support to enable querying of atomic
>> write capabilities for buffered IO and direct IO separately.
> You're over complicating this way too much by trying to restrict the
> functionality down to just what you want to implement right now.
> 
> RWF_ATOMIC is no different to RWF_NOWAIT. The API doesn't decide
> what can be supported - the filesystems themselves decide what part
> of the API they can support and implement those pieces.

Sure, but for RWF_ATOMIC we still have the associated statx call to tell 
us whether atomic writes are supported for a file and the specific range 
capability.

> 
> TO go back to RWF_NOWAIT, for a long time we (XFS) only supported
> RWF_NOWAIT on DIO, and buffered reads and writes were given
> -EOPNOTSUPP by the filesystem. Then other filesystems started
> supporting DIO with RWF_NOWAIT. Then buffered read support was added
> to the page cache and XFS, and as other filesystems were converted
> they removed the RWF_NOWAIT exclusion check from their read IO
> paths.
> 
> We are now in the same place with buffered write support for
> RWF_NOWAIT. XFS, the page cache and iomap allow buffered writes w/
> RWF_NOWAIT, but ext4, btrfs and f2fs still all return -EOPNOTSUPP
> because they don't support non-blocking buffered writes yet.
> 
> This is the same model we should be applying with RWF_ATOMIC - we
> know that over time we'll be able to expand support for atomic
> writes across both direct and buffered IO, so we should not be
> restricting the API or infrastructure to only allow RWF_ATOMIC w/
> DIO.

Agreed.

> Just have the filesystems reject RWF_ATOMIC w/ -EOPNOTSUPP if
> they don't support it,

Yes, I was going to add this regardless.

> and for those that do it is conditional on
> whther the filesystem supports it for the given type of IO being
> done.
> 
> Seriously - an application can easily probe for RWF_ATOMIC support
> without needing information to be directly exposed in statx() - just
> open a O_TMPFILE, issue the type of RWF_ATOMIC IO you require to be
> supported, and if it returns -EOPNOTSUPP then it you can't use
> RWF_ATOMIC optimisations in the application....

ok, if that is the done thing.

So I can't imagine that atomic write unit range will be different for 
direct IO and buffered IO (ignoring for a moment Christoph's idea for 
CoW always for no HW offload) when supported. But it seems that we may 
have a scenario where statx tells is that atomic writes are supported 
for a file, and a DIO write succeeds and a buffered IO write may return 
-EOPNOTSUPP. If that's acceptable then I'll work towards that.

If we could just run statx on a file descriptor here then that would be 
simpler...

Thanks,
John


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ