linux-ext4 - Re: [RFC][PATCH] link.2: AT_ATOMIC_DATA and AT_ATOMIC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190531232852.GG29573@dread.disaster.area>
Date:   Sat, 1 Jun 2019 09:28:52 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Theodore Ts'o <tytso@....edu>
Cc:     Amir Goldstein <amir73il@...il.com>, Jan Kara <jack@...e.cz>,
        "Darrick J . Wong" <darrick.wong@...cle.com>,
        Chris Mason <clm@...com>, Al Viro <viro@...iv.linux.org.uk>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        linux-xfs <linux-xfs@...r.kernel.org>,
        Ext4 <linux-ext4@...r.kernel.org>,
        Linux Btrfs <linux-btrfs@...r.kernel.org>,
        Linux API <linux-api@...r.kernel.org>
Subject: Re: [RFC][PATCH] link.2: AT_ATOMIC_DATA and AT_ATOMIC_METADATA

On Sat, Jun 01, 2019 at 08:45:49AM +1000, Dave Chinner wrote:
> Given that we can already use AIO to provide this sort of ordering,
> and AIO is vastly faster than synchronous IO, I don't see any point
> in adding complex barrier interfaces that can be /easily implemented
> in userspace/ using existing AIO primitives. You should start
> thinking about expanding libaio with stuff like
> "link_after_fdatasync()" and suddenly the whole problem of
> filesystem data vs metadata ordering goes away because the
> application directly controls all ordering without blocking and
> doesn't need to care what the filesystem under it does....

And let me point out that this is also how userspace can do an
efficient atomic rename - rename_after_fdatasync(). i.e. on
completion of the AIO_FSYNC, run the rename. This guarantees that
the application will see either the old file of the complete new
file, and it *doesn't have to wait for the operation to complete*.
Once it is in flight, the file will contain the old data until some
point in the near future when will it contain the new data....

Seriously, sit down and work out all the "atomic" data vs metadata
behaviours you want, and then tell me how many of them cannot be
implemented as "AIO_FSYNC w/ completion callback function" in
userspace. This mechanism /guarantees ordering/ at the application
level, the application does not block waiting for these data
integrity operations to complete, and you don't need any new kernel
side functionality to implement this.

Fundamentally, the assertion that disk cache flushes are not what
causes fsync "to be slow" is incorrect. It's the synchronous
"waiting for IO completion" that makes fsync "slow". AIO_FSYNC
avoids needing to wait for IO completion, allowing the application
to do useful work (like issue more DI ops) while data integrity
operations are in flight. At this point, fsync is no longer a "slow"
operation - it's just another background async data flush operation
like the BDI flusher thread...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com