[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d574743c-f0b7-4030-84c3-da334b02466a@oracle.com>
Date: Thu, 5 Dec 2024 11:51:23 +0000
From: John Garry <john.g.garry@...cle.com>
To: "Darrick J. Wong" <djwong@...nel.org>, Dave Chinner <david@...morbit.com>
Cc: brauner@...nel.org, cem@...nel.org, dchinner@...hat.com, hch@....de,
ritesh.list@...il.com, linux-xfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
martin.petersen@...cle.com
Subject: Re: [PATCH 1/4] iomap: Lift blocksize restriction on atomic writes
On 05/12/2024 06:30, Darrick J. Wong wrote:
> On Thu, Dec 05, 2024 at 07:35:45AM +1100, Dave Chinner wrote:
>> On Wed, Dec 04, 2024 at 03:43:41PM +0000, John Garry wrote:
>>> From: "Ritesh Harjani (IBM)" <ritesh.list@...il.com>
>>>
>>> Filesystems like ext4 can submit writes in multiples of blocksizes.
>>> But we still can't allow the writes to be split into multiple BIOs. Hence
>>> let's check if the iomap_length() is same as iter->len or not.
>>>
>>> It is the responsibility of userspace to ensure that a write does not span
>>> mixed unwritten and mapped extents (which would lead to multiple BIOs).
>>
>> How is "userspace" supposed to do this?
>>
>> No existing utility in userspace is aware of atomic write limits or
>> rtextsize configs, so how does "userspace" ensure everything is
>> laid out in a manner compatible with atomic writes?
>>
>> e.g. restoring a backup (or other disaster recovery procedures) is
>> going to have to lay the files out correctly for atomic writes.
>> backup tools often sparsify the data set and so what gets restored
>> will not have the same layout as the original data set...
>>
>> Where's the documentation that outlines all the restrictions on
>> userspace behaviour to prevent this sort of problem being triggered?
>> Common operations such as truncate, hole punch, buffered writes,
>> reflinks, etc will trip over this, so application developers, users
>> and admins really need to know what they should be doing to avoid
>> stepping on this landmine...
>
> I'm kinda assuming that this requires forcealign to get the extent
> alignments correct, and writing zeroes non-atomically if the extent
> state gets mixed up before retrying the untorn write. John?
Sure, the code to do the automatic pre-zeroing and retry the atomic
write is not super complicated.
It's just a matter or whether we add it or not.
Thanks,
John
Powered by blists - more mailing lists