linux-kernel - Re: [LSF/MM/BPF TOPIC] untorn buffered writes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <fc768320-f4a2-43d9-a7de-4441b60ced28@oracle.com>
Date: Tue, 11 Jun 2024 16:23:22 +0100
From: John Garry <john.g.garry@...cle.com>
To: Theodore Ts'o <tytso@....edu>
Cc: Luis Chamberlain <mcgrof@...nel.org>, David Bueso <dave@...olabs.net>,
        lsf-pc@...ts.linux-foundation.org, linux-fsdevel@...r.kernel.org,
        linux-mm <linux-mm@...ck.org>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Dave Chinner <david@...morbit.com>, linux-kernel@...r.kernel.org,
        catherine.hoang@...cle.com
Subject: Re: [LSF/MM/BPF TOPIC] untorn buffered writes

On 01/06/2024 10:33, Theodore Ts'o wrote:
> On Thu, May 23, 2024 at 12:59:57PM +0100, John Garry wrote:
>>
>> That's my point really. There were some positive discussion. I put across
>> the idea of implementing buffered atomic writes, and now I want to ensure
>> that everyone is satisfied with that going forward. I think that a LWN
>> report is now being written.
> 
> I checked in with some PostgreSQL developers after LSF/MM, and
> unfortunately, the idea of immediately sending atomic buffered I/O
> directly to the storage device is going to be problematic for them.

This was not my idea (for supporting buffered atomic writes).

As I remember, that was a candidate solution for dealing with the 
problem that is how to tag a buffered write as atomic. Or deal with 
overlapping atomic writes. And that solution is to just write through, 
so we don't need to remember if it was atomic.

For performance reasons, I was not keen on that, and prefer the solution 
I already mentioned earlier.

> The problem is that they depend on the database to coalesce writes for
> them.  So if they are doing a large database commit that involves
> touching hundreds or thousands of 16k database pages, they today issue
> a separate buffered write request for each database page.  So if we
> turn each one into an immediate SCSI/NVMe write request, that would be
> disastrous for performance. 

FWIW, atomic writes support merging in the block layer.

But, that aside, IMHO, talking about performance like this is close to 
speculation.

> Yes, when they migrate to using Direct
> I/O, the database is going to have to figure out how to coalesce write
> requests; but this is why it's going to take at least 3 years to make
> this migration (and some will call this hopelessly optimistic), and
> then users will probably wait another 3 to 5 years before they trust
> that the database rewrite to use Direct I/O will get it right and
> trust their enterprise workloads to it....
> 
> So I think this goes back to either (a) trying to track which writes
> we've promised atomic write semantics, or (b) using a completely
> different API that only promises "untorn writes with a specified
> granulatity" approach for the untorn buffered writes I/O interface,
> instead in addition to, or instead of, the current "atomic write"
> interface which we are currently trying to promulate for Direct I/O.
> 
> Personally, I'd advocate for two separate interfaces; one for "atomic"
> I/O's, and a different one for "untorn writes with a specified
> guaranteed granularity".  And if XFS folks want to turn the atomic I/O
> interface into something where you can do a multi-megabyte atomic
> write into something that requires allocating new blocks and
> atomically mutating the file system metadata to do this kind of
> atomicity --- even though the Database folks Don't Care --- God bless.

At this stage, if people want buffered atomic writes support for 
PostgreSQL - and not prepared to wait for or help with direct io support 
for that DB - then they need to design/extend a kernel API, implement 
that, and then port PostgreSQL. Then the performance figures can be 
seen. And then try to upstream kernel support.

We have already done such a thing for MySQL for direct IO. We know that 
the performance is good, and we want to support it in the kernel today.

> 
> But let's have something which *just* promises the guarantee requested
> by the primary requesteres of this interface, at least for the
> buffered I/O case.
> 

I think that you need decide whether you want to endorse our direct IO 
support today (and give acked-by or similar), or .. live with probably 
no support for any sort of atomic writes in the kernel...

Thanks,
John