linux-ext4 - Re: [PATCH v3 09/13] ext4: fast-commit commit path changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20191024201800.GE1124@mit.edu>
Date:   Thu, 24 Oct 2019 16:18:00 -0400
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Xiaohui1 Li 李晓辉 <lixiaohui1@...omi.com>
Cc:     "lixiaohui1@...omi.corp-partner.google.com" 
        <lixiaohui1@...omi.corp-partner.google.com>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH v3 09/13] ext4: fast-commit commit path changes

On Thu, Oct 24, 2019 at 06:54:44AM +0000, Xiaohui1 Li 李晓辉 wrote:
> 
> But i also have an idea which can simplify the fast commit patch.
> because we want to fix fsync cost too much time problems on our
> mobile phone without format the whole ext4 partition , and i found
> current fast commit patch can't do this job as it need to
> readjustment the layout of journal area and will destroy phone
> user's data from my opinion .

That's not correct.  The fast commit feature can be added to an
existing ext4 file system.  That's because when the ext4 file system
is mounted (or when e2fsck is run) the contents of the file system
journal (if any) are replayed and then discard.  On a clean shutdown,
the journal is empty to begin with.

Hence, restructuring the journal so that a portion of the space can be
used for fast commits can be done without modifying or otherwise
destroying the data on the pre-existing file system.

> so my simplify idea is that:
> when jbd2 thread begin to commit the current transaction , why not
> divide the commiting work into two sub work ? firstly flush metadata
> generated by fsynced handles to disk, and then append a commit end
> block. and then tell the fsync threads that no need to wait, as
> their metadata has already been flush to disk journal area, the
> fsync work is finished.  and then the second sub work is to
> committing metadata and data generated by left handles in current
> transaction.

The problem, as I stated in my earlier message, is that the handles
that were not involved in the fsync in many cases will have been
started and completed before the changes reflected by the handles
involving the inode to be fsync'ed.  We can't just "separate out the
handles" and commit the ones that are necessary, and then do the rest
in a separate transaction.  The problem is entagled dependencies.  For
example, one of the handles not involved with the fsync may have
modified the inode table or the allocation bitmap that is involved
with the update to the inode to be fsync'ed.  We can't just flush the
metadata blocks involved with the "fsync handles", since they will
include the modifications made by other file systems via "the rest of
the handles."

So no, we can't do what you are suggesting.  If it were that easy, we
would have done it a long time ago.

The reason why you can't separate out some of the handles from others
is referenced in the LWN article, "Soft Updates, Hard Problems"[1].
What you are suggesting is not exactly soft updates, but it suffers
from the same problem, namely that of entangled updates, where the
same block is modified by multiple handles.  If you track all of the
logical dependencies, you could potentially "roll back" in memory
those changes which are not yet committed, and then after commit of
the "fsync hanldes", roll them forward again.  But this is hopelessly
complicated to get right.

[1] https://lwn.net/Articles/339337/

So if you implemented your suggestion, and the system were to crash
between the first and second commit, the file system would be
corrupted, and in the worst case, e2fsck might not be able to recover
the file system, and all of the user's data would be lost.  Of course,
if you are sure that your system will never crash, because the kernel
is bug-free(tm), then you could skip using the journalling altogher.....

						- Ted

P.S.  It's actually a little bit more complicated than that; you also
need to worry about power drops, so the battery needs to be embedded,
so there is no chance the battery will come flying out when the phone
is dropped.  The EC also has to be able to give a low-pattery warning
so that the system can be shut down cleanly before the battery power
goes to zero, and you can't allow the emergency poweroff where the
user pushes and holds the power buton for eight seconds.  The last,
after all, won't be needed because we are making the hopelessly
unrealistic assumption that the kernel is completely, 100%,
bug-free(tm).   :-)