[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <22C71398-EFD7-4638-AAE4-CE7E30E95B7E@dilger.ca>
Date: Sat, 15 Sep 2018 12:04:51 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: 焦晓冬 <milestonejxd@...il.com>
Cc: Dave Chinner <david@...morbit.com>, cmumford@...mford.com,
linux-btrfs <linux-btrfs@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Ext4 Developers List <linux-ext4@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: metadata operation reordering regards to crash
On Sep 15, 2018, at 12:58 AM, 焦晓冬 <milestonejxd@...il.com> wrote:
>
> On Sat, Sep 15, 2018 at 6:23 AM Dave Chinner <david@...morbit.com> wrote:
>>
>> On Fri, Sep 14, 2018 at 05:06:44PM +0800, 焦晓冬 wrote:
>>> Hi, all,
>>>
>>> A probably bit of complex question:
>>> Does nowadays practical filesystems, eg., extX, btfs, preserve metadata
>>> operation order through a crash/power failure?
>>
>> Yes.
>>
>> Behaviour is filesystem dependent, but we have tests in fstests that
>> specifically exercise order preservation across filesystem failures.
>>
>>> What I know is modern filesystems ensure metadata consistency
>>> after crash/power failure. Journal filesystems like extX do that by
>>> write-ahead logging of metadata operations into transactions. Other
>>> filesystems do that in various ways as btfs do that by COW.
>>>
>>> What I'm not so far clear is whether these filesystems preserve
>>> metadata operation order after a crash.
>>>
>>> For example,
>>> op 1. rename(A, B)
>>> op 2. rename(C, D)
>>>
>>> As mentioned above, metadata consistency is ensured after a crash.
>>> Thus, B is either the original B(or not exists) or has been replaced by A.
>>> The same to D.
>>>
>>> Is it possible that, after a crash, D has been replaced by C but B is still
>>> the original file(or not exists)?
>>
>> Not for XFS, ext4, btrfs or f2fs. Other filesystems might be
>> different.
>
> Thanks, Dave,
>
> I found this archive:
> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg31937.html
>
> It seems btrfs people thinks reordering could happen.
>
> It is a relatively old reply. Has the implement changed? Or is there
> some new standard that requires reordering not happen?
There is nothing in POSIX that requires any particular ordering. However,
the sequence "A, B, C, sync C" on ext3/ext4 has "always" resulted in A, B
also being sync'd to disk (including parent directory creation, etc).
For a while, ext4 with delayed allocation resulted in write A, rename A->B
causing "B" to potentially not have any data (commit v2.6.29-5120-g8750c6d).
While the applications are depending on non-POSIX behaviour, the operation
ordering behaviour has been around long that applications have grown to
depend on it, and consider the filesystem to have a bug when it doesn't
behave that way.
If you want to write a robust application, you should fsync() the files you
care about (possibly with AIO so you get a notification on completion rather
than waiting).
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)
Powered by blists - more mailing lists