lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20190531232852.GG29573@dread.disaster.area> Date: Sat, 1 Jun 2019 09:28:52 +1000 From: Dave Chinner <david@...morbit.com> To: Theodore Ts'o <tytso@....edu> Cc: Amir Goldstein <amir73il@...il.com>, Jan Kara <jack@...e.cz>, "Darrick J . Wong" <darrick.wong@...cle.com>, Chris Mason <clm@...com>, Al Viro <viro@...iv.linux.org.uk>, linux-fsdevel <linux-fsdevel@...r.kernel.org>, linux-xfs <linux-xfs@...r.kernel.org>, Ext4 <linux-ext4@...r.kernel.org>, Linux Btrfs <linux-btrfs@...r.kernel.org>, Linux API <linux-api@...r.kernel.org> Subject: Re: [RFC][PATCH] link.2: AT_ATOMIC_DATA and AT_ATOMIC_METADATA On Sat, Jun 01, 2019 at 08:45:49AM +1000, Dave Chinner wrote: > Given that we can already use AIO to provide this sort of ordering, > and AIO is vastly faster than synchronous IO, I don't see any point > in adding complex barrier interfaces that can be /easily implemented > in userspace/ using existing AIO primitives. You should start > thinking about expanding libaio with stuff like > "link_after_fdatasync()" and suddenly the whole problem of > filesystem data vs metadata ordering goes away because the > application directly controls all ordering without blocking and > doesn't need to care what the filesystem under it does.... And let me point out that this is also how userspace can do an efficient atomic rename - rename_after_fdatasync(). i.e. on completion of the AIO_FSYNC, run the rename. This guarantees that the application will see either the old file of the complete new file, and it *doesn't have to wait for the operation to complete*. Once it is in flight, the file will contain the old data until some point in the near future when will it contain the new data.... Seriously, sit down and work out all the "atomic" data vs metadata behaviours you want, and then tell me how many of them cannot be implemented as "AIO_FSYNC w/ completion callback function" in userspace. This mechanism /guarantees ordering/ at the application level, the application does not block waiting for these data integrity operations to complete, and you don't need any new kernel side functionality to implement this. Fundamentally, the assertion that disk cache flushes are not what causes fsync "to be slow" is incorrect. It's the synchronous "waiting for IO completion" that makes fsync "slow". AIO_FSYNC avoids needing to wait for IO completion, allowing the application to do useful work (like issue more DI ops) while data integrity operations are in flight. At this point, fsync is no longer a "slow" operation - it's just another background async data flush operation like the BDI flusher thread... Cheers, Dave. -- Dave Chinner david@...morbit.com
Powered by blists - more mailing lists