[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b41c8df-bc56-4ff1-b2ed-70ec00080f95@oracle.com>
Date: Tue, 5 Dec 2023 11:09:46 +0000
From: John Garry <john.g.garry@...cle.com>
To: Theodore Ts'o <tytso@....edu>
Cc: Christoph Hellwig <hch@....de>, axboe@...nel.dk, kbusch@...nel.org,
sagi@...mberg.me, jejb@...ux.ibm.com, martin.petersen@...cle.com,
djwong@...nel.org, viro@...iv.linux.org.uk, brauner@...nel.org,
chandan.babu@...cle.com, dchinner@...hat.com,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-nvme@...ts.infradead.org, linux-xfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, jbongio@...gle.com,
linux-api@...r.kernel.org
Subject: Re: [PATCH 17/21] fs: xfs: iomap atomic write support
On 05/12/2023 04:55, Theodore Ts'o wrote:
>> AFAICS, this is without any kernel changes, so no guarantee of unwanted
>> splitting or merging of bios.
> Well, more than one company has audited the kernel paths, and it turns
> out that for selected Kernel versions, after doing desk-check
> verification of the relevant kernel baths, as well as experimental
> verification via testing to try to find torn writes in the kernel, we
> can make it safe for specific kernel versions which might be used in
> hosted MySQL instances where we control the kernel, the mysql server,
> and the emulated block device (and we know the database is doing
> Direct I/O writes --- this won't work for PostgreSQL). I gave a talk
> about this at Google I/O Next '18, five years ago[1].
>
> [1]https://urldefense.com/v3/__https://www.youtube.com/watch?v=gIeuiGg-_iw__;!!ACWV5N9M2RV99hQ!I4iRp4xUyzAT0UwuEcnUBBCPKLXFKfk5FNmysFbKcQYfl0marAll5xEEVyB5mMFDqeckCWLmjU1aCR2Z$
>
> Given the performance gains (see the talk (see the comparison of the
> at time 19:31 and at 29:57) --- it's quite compelling.
>
> Of course, I wouldn't recommend this approach for a naive sysadmin,
> since most database adminsitrators won't know how to audit kernel code
> (see the discussion at time 35:10 of the video), and reverify the
> entire software stack before every kernel upgrade.
Sure
> The challenge is
> how to do this safely.
Right, and that is why I would be concerned about advertising torn-write
protection support, but someone has not gone through the effort of
auditing and verification phase to ensure that this does not happen in
their software stack ever.
>
> The fact remains that both Amazon's EBS and Google's Persistent Disk
> products are implemented in such a way that writes will not be torn
> below the virtual machine, and the guarantees are in fact quite a bit
> stronger than what we will probably end up advertising via NVMe and/or
> SCSI. It wouldn't surprise me if this is the case (or could be made
> to be the case) For Oracle Cloud as well.
>
> The question is how to make this guarantee so that the kernel knows
> when various cloud-provided block devicse do provide these greater
> guarantees, and then how to make it be an architected feature, as
> opposed to a happy implementation detail that has to be verified at
> every kernel upgrade.
The kernel can only judge atomic write support from what the HW product
data tells us, so cloud-provided block devices need to provide that
information as best possible if emulating the some storage technology.
Thanks,
John
Powered by blists - more mailing lists