linux-ext4 - Re: metadata operation reordering regards to crash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <22C71398-EFD7-4638-AAE4-CE7E30E95B7E@dilger.ca>
Date:   Sat, 15 Sep 2018 12:04:51 -0600
From:   Andreas Dilger <adilger@...ger.ca>
To:     焦晓冬 <milestonejxd@...il.com>
Cc:     Dave Chinner <david@...morbit.com>, cmumford@...mford.com,
        linux-btrfs <linux-btrfs@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: metadata operation reordering regards to crash

On Sep 15, 2018, at 12:58 AM, 焦晓冬 <milestonejxd@...il.com> wrote:
> 
> On Sat, Sep 15, 2018 at 6:23 AM Dave Chinner <david@...morbit.com> wrote:
>> 
>> On Fri, Sep 14, 2018 at 05:06:44PM +0800, 焦晓冬 wrote:
>>> Hi, all,
>>> 
>>> A probably bit of complex question:
>>> Does nowadays practical filesystems, eg., extX, btfs, preserve metadata
>>> operation order through a crash/power failure?
>> 
>> Yes.
>> 
>> Behaviour is filesystem dependent, but we have tests in fstests that
>> specifically exercise order preservation across filesystem failures.
>> 
>>> What I know is modern filesystems ensure metadata consistency
>>> after crash/power failure. Journal filesystems like extX do that by
>>> write-ahead logging of metadata operations into transactions. Other
>>> filesystems do that in various ways as btfs do that by COW.
>>> 
>>> What I'm not so far clear is whether these filesystems preserve
>>> metadata operation order after a crash.
>>> 
>>> For example,
>>> op 1.  rename(A, B)
>>> op 2.  rename(C, D)
>>> 
>>> As mentioned above,  metadata consistency is ensured after a crash.
>>> Thus, B is either the original B(or not exists) or has been replaced by A.
>>> The same to D.
>>> 
>>> Is it possible that, after a crash, D has been replaced by C but B is still
>>> the original file(or not exists)?
>> 
>> Not for XFS, ext4, btrfs or f2fs. Other filesystems might be
>> different.
> 
> Thanks, Dave,
> 
> I found this archive:
> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg31937.html
> 
> It seems btrfs people thinks reordering could happen.
> 
> It is a relatively old reply. Has the implement changed? Or is there
> some new standard that requires reordering not happen?

There is nothing in POSIX that requires any particular ordering.  However,
the sequence "A, B, C, sync C" on ext3/ext4 has "always" resulted in A, B
also being sync'd to disk (including parent directory creation, etc).

For a while, ext4 with delayed allocation resulted in write A, rename A->B
causing "B" to potentially not have any data (commit v2.6.29-5120-g8750c6d).
While the applications are depending on non-POSIX behaviour, the operation
ordering behaviour has been around long that applications have grown to
depend on it, and consider the filesystem to have a bug when it doesn't
behave that way.

If you want to write a robust application, you should fsync() the files you
care about (possibly with AIO so you get a notification on completion rather
than waiting).

Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)