[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <37cab50b-5791-4840-b7b7-c67d3878fced@oracle.com>
Date: Mon, 16 Dec 2024 08:40:34 +0000
From: John Garry <john.g.garry@...cle.com>
To: "Darrick J. Wong" <djwong@...nel.org>, Christoph Hellwig <hch@....de>
Cc: brauner@...nel.org, cem@...nel.org, dchinner@...hat.com,
ritesh.list@...il.com, linux-xfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
martin.petersen@...cle.com
Subject: Re: [PATCH v2 0/7] large atomic writes for xfs
>>
>> Yeah, at the low end, it may make sense to do the 512B write via DIO. But
>> OTOH sync'ing many redo log FS blocks at once at the high end can be more
>> efficient.
>>
>> From what I have heard, this was attempted before (using DIO) by some
>> vendor, but did not come to much.
>>
>> So it seems that we are stuck with this redo log limitation.
>>
>> Let me know if you have any other ideas to avoid large atomic writes...
>
> From the description it sounds like the redo log consists of 512b blocks
> that describe small changes to the 16k table file pages. If they're
> issuing 16k atomic writes to get each of those 512b redo log records to
> disk it's no wonder that cranks up the overhead substantially.
They are not issuing the redo log atomically. They do 512B buffered
writes and then periodically fsync.
> Also,
> replaying those tiny updates through the pagecache beats issuing a bunch
> of tiny nonlocalized writes.
>
> For the first case I don't know why they need atomic writes -- 512b redo
> log records can't be torn because they're single-sector writes. The
> second case might be better done with exchange-range.
>
As for exchange-range, that would very much pre-date any MySQL port.
Furthermore, I can't imagine that exchange-range support is portable to
other FSes, which is probably quite important. Anyway, they are not
issuing the redo log atomically, so I don't know if mentioning
exchange-range is relevant.
Regardless of what MySQL is specifically doing here, there are going to
be other users/applications which want to keep a 4K FS blocksize and do
larger atomic writes.
Thanks,
John
Powered by blists - more mailing lists