lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b41c8df-bc56-4ff1-b2ed-70ec00080f95@oracle.com>
Date:   Tue, 5 Dec 2023 11:09:46 +0000
From:   John Garry <john.g.garry@...cle.com>
To:     Theodore Ts'o <tytso@....edu>
Cc:     Christoph Hellwig <hch@....de>, axboe@...nel.dk, kbusch@...nel.org,
        sagi@...mberg.me, jejb@...ux.ibm.com, martin.petersen@...cle.com,
        djwong@...nel.org, viro@...iv.linux.org.uk, brauner@...nel.org,
        chandan.babu@...cle.com, dchinner@...hat.com,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-nvme@...ts.infradead.org, linux-xfs@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, jbongio@...gle.com,
        linux-api@...r.kernel.org
Subject: Re: [PATCH 17/21] fs: xfs: iomap atomic write support

On 05/12/2023 04:55, Theodore Ts'o wrote:
>> AFAICS, this is without any kernel changes, so no guarantee of unwanted
>> splitting or merging of bios.
> Well, more than one company has audited the kernel paths, and it turns
> out that for selected Kernel versions, after doing desk-check
> verification of the relevant kernel baths, as well as experimental
> verification via testing to try to find torn writes in the kernel, we
> can make it safe for specific kernel versions which might be used in
> hosted MySQL instances where we control the kernel, the mysql server,
> and the emulated block device (and we know the database is doing
> Direct I/O writes --- this won't work for PostgreSQL).  I gave a talk
> about this at Google I/O Next '18, five years ago[1].
> 
> [1]https://urldefense.com/v3/__https://www.youtube.com/watch?v=gIeuiGg-_iw__;!!ACWV5N9M2RV99hQ!I4iRp4xUyzAT0UwuEcnUBBCPKLXFKfk5FNmysFbKcQYfl0marAll5xEEVyB5mMFDqeckCWLmjU1aCR2Z$  
> 
> Given the performance gains (see the talk (see the comparison of the
> at time 19:31 and at 29:57) --- it's quite compelling.
> 
> Of course, I wouldn't recommend this approach for a naive sysadmin,
> since most database adminsitrators won't know how to audit kernel code
> (see the discussion at time 35:10 of the video), and reverify the
> entire software stack before every kernel upgrade.

Sure

>  The challenge is
> how to do this safely.

Right, and that is why I would be concerned about advertising torn-write 
protection support, but someone has not gone through the effort of 
auditing and verification phase to ensure that this does not happen in 
their software stack ever.

> 
> The fact remains that both Amazon's EBS and Google's Persistent Disk
> products are implemented in such a way that writes will not be torn
> below the virtual machine, and the guarantees are in fact quite a bit
> stronger than what we will probably end up advertising via NVMe and/or
> SCSI.  It wouldn't surprise me if this is the case (or could be made
> to be the case) For Oracle Cloud as well.
> 
> The question is how to make this guarantee so that the kernel knows
> when various cloud-provided block devicse do provide these greater
> guarantees, and then how to make it be an architected feature, as
> opposed to a happy implementation detail that has to be verified at
> every kernel upgrade.

The kernel can only judge atomic write support from what the HW product 
data tells us, so cloud-provided block devices need to provide that 
information as best possible if emulating the some storage technology.

Thanks,
John

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ