lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zhg0_Pvlh9zy4zzG@bombadil.infradead.org>
Date: Thu, 11 Apr 2024 12:07:40 -0700
From: Luis Chamberlain <mcgrof@...nel.org>
To: John Garry <john.g.garry@...cle.com>
Cc: Matthew Wilcox <willy@...radead.org>,
	Pankaj Raghav <p.raghav@...sung.com>,
	Daniel Gomez <da.gomez@...sung.com>,
	Javier González <javier.gonz@...sung.com>,
	axboe@...nel.dk, kbusch@...nel.org, hch@....de, sagi@...mberg.me,
	jejb@...ux.ibm.com, martin.petersen@...cle.com, djwong@...nel.org,
	viro@...iv.linux.org.uk, brauner@...nel.org, dchinner@...hat.com,
	jack@...e.cz, linux-block@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
	linux-fsdevel@...r.kernel.org, tytso@....edu, jbongio@...gle.com,
	linux-scsi@...r.kernel.org, ojaswin@...ux.ibm.com,
	linux-aio@...ck.org, linux-btrfs@...r.kernel.org,
	io-uring@...r.kernel.org, nilay@...ux.ibm.com,
	ritesh.list@...il.com
Subject: Re: [PATCH v6 00/10] block atomic writes

On Wed, Apr 10, 2024 at 09:34:36AM +0100, John Garry wrote:
> On 08/04/2024 18:50, Luis Chamberlain wrote:
> > I agree that when you don't set the sector size to 16k you are not forcing the
> > filesystem to use 16k IOs, the metadata can still be 4k. But when you
> > use a 16k sector size, the 16k IOs should be respected by the
> > filesystem.
> > 
> > Do we break BIOs to below a min order if the sector size is also set to
> > 16k?  I haven't seen that and its unclear when or how that could happen.
> 
> AFAICS, the only guarantee is to not split below LBS.

It would be odd to split a BIO given a inode requirement size spelled
out, but indeed I don't recall verifying this gaurantee.

> > At least for NVMe we don't need to yell to a device to inform it we want
> > a 16k IO issued to it to be atomic, if we read that it has the
> > capability for it, it just does it. The IO verificaiton can be done with
> > blkalgn [0].
> > 
> > Does SCSI*require*  an 16k atomic prep work, or can it be done implicitly?
> > Does it need WRITE_ATOMIC_16?
> 
> physical block size is what we can implicitly write atomically.

Yes, and also on flash to avoid read modify writes.

> So if you
> have a 4K PBS and 512B LBS, then WRITE_ATOMIC_16 would be required to write
> 16KB atomically.

Ugh. Why does SCSI requires a special command for this?

Now we know what would be needed to bump the physical block size, it is
certainly a different feature, however I think it would be good to
evaluate that world too. For NVMe we don't have such special write
requirements.

I put together this kludge with the last patches series of LBS + the
bdev cache aops stuff (which as I said before needs an alternative
solution) and just the scsi atomics topology + physical block size
change to easily experiment to see what would break:

https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20240408-lbs-scsi-kludge

Using a larger sector size works but it does not use the special scsi
atomic write.

> > > To me, O_ATOMIC would be required for buffered atomic writes IO, as we want
> > > a fixed-sized IO, so that would mean no mixing of atomic and non-atomic IO.
> > Would using the same min and max order for the inode work instead?
> 
> Maybe, I would need to check further.

I'd be happy to help review too.

  Luis

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ