lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZgOXb_oZjsUU12YL@casper.infradead.org>
Date: Wed, 27 Mar 2024 03:50:07 +0000
From: Matthew Wilcox <willy@...radead.org>
To: John Garry <john.g.garry@...cle.com>
Cc: axboe@...nel.dk, kbusch@...nel.org, hch@....de, sagi@...mberg.me,
	jejb@...ux.ibm.com, martin.petersen@...cle.com, djwong@...nel.org,
	viro@...iv.linux.org.uk, brauner@...nel.org, dchinner@...hat.com,
	jack@...e.cz, linux-block@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
	linux-fsdevel@...r.kernel.org, tytso@....edu, jbongio@...gle.com,
	linux-scsi@...r.kernel.org, ojaswin@...ux.ibm.com,
	linux-aio@...ck.org, linux-btrfs@...r.kernel.org,
	io-uring@...r.kernel.org, nilay@...ux.ibm.com,
	ritesh.list@...il.com
Subject: Re: [PATCH v6 00/10] block atomic writes

On Tue, Mar 26, 2024 at 01:38:03PM +0000, John Garry wrote:
> The goal here is to provide an interface that allows applications use
> application-specific block sizes larger than logical block size
> reported by the storage device or larger than filesystem block size as
> reported by stat().
> 
> With this new interface, application blocks will never be torn or
> fractured when written. For a power fail, for each individual application
> block, all or none of the data to be written. A racing atomic write and
> read will mean that the read sees all the old data or all the new data,
> but never a mix of old and new.
> 
> Three new fields are added to struct statx - atomic_write_unit_min,
> atomic_write_unit_max, and atomic_write_segments_max. For each atomic
> individual write, the total length of a write must be a between
> atomic_write_unit_min and atomic_write_unit_max, inclusive, and a
> power-of-2. The write must also be at a natural offset in the file
> wrt the write length. For pwritev2, iovcnt is limited by
> atomic_write_segments_max.
> 
> There has been some discussion on supporting buffered IO and whether the
> API is suitable, like:
> https://lore.kernel.org/linux-nvme/ZeembVG-ygFal6Eb@casper.infradead.org/
> 
> Specifically the concern is that supporting a range of sizes of atomic IO
> in the pagecache is complex to support. For this, my idea is that FSes can
> fix atomic_write_unit_min and atomic_write_unit_max at the same size, the
> extent alignment size, which should be easier to support. We may need to
> implement O_ATOMIC to avoid mixing atomic and non-atomic IOs for this. I
> have no proposed solution for atomic write buffered IO for bdev file
> operations, but I know of no requirement for this.

The thing is that there's no requirement for an interface as complex as
the one you're proposing here.  I've talked to a few database people
and all they want is to increase the untorn write boundary from "one
disc block" to one database block, typically 8kB or 16kB.

So they would be quite happy with a much simpler interface where they
set the inode block size at inode creation time, and then all writes to
that inode were guaranteed to be untorn.  This would also be simpler to
implement for buffered writes.

Who's asking for this more complex interface?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ