linux-ext4 - Re: [PATCH 0/5] ext4: direct IO via iomap infrastructure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190828142729.GB24857@mit.edu>
Date:   Wed, 28 Aug 2019 10:27:29 -0400
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Matthew Bobrowski <mbobrowski@...browski.org>
Cc:     Christoph Hellwig <hch@...radead.org>,
        "Darrick J. Wong" <darrick.wong@...cle.com>,
        Ritesh Harjani <riteshh@...ux.ibm.com>, jack@...e.cz,
        adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, aneesh.kumar@...ux.ibm.com
Subject: Re: [PATCH 0/5] ext4: direct IO via iomap infrastructure

On Wed, Aug 28, 2019 at 10:05:11PM +1000, Matthew Bobrowski wrote:
> > What is not clear to me at this point though is whether it is still
> > necessary to explicitly track unwritten extents via in-core inode
> > attributes i.e. ->i_unwritten and ->i_state_flags under the new direct
> > IO code path implementation, which makes use of the iomap
> > infrastructure. Or, whether we can get away with simply not using
> > these in-core inode attributes and rely just on checks against the
> > extent record directly, as breifly mentioned by Darrick. I would think
> > that this type of check would be enough, however the checks around
> > whether the inode is currently undergoing direct IO were implemented
> > at some point, so there must be a reason for having them
> > (a9b8241594add)?

The original reason why we created the DIO_STATE_UNWRITTEN flag was a
fast path, where the common case is writing blocks to an existing
location in a file where the blocks are already allocated, and marked
as written.  So consulting the on-disk extent tree to determine
whether unwritten extents need to be converted and/or split is
certainly doable.  However, it's expensive for the common case.  So
having a hint whether we need to schedule a workqueue to possibly
convert an unwritten region is helpful.  If we can just free the bio
and exit the I/O completion handler without having to take shared
locks to examine the on-disk extent tree, so much the better.

> Maybe it's a silly question, although I'm wanting to clarify my
> understanding around why it is that when we either try prepend or
> append to an existing extent, we don't permit merging of extents if

If I recall correctly, the reason for this check was mainly the
concern that we would end up merging an extent that we would then have
to split later on (when the direct I/O completed).

To be honest, i'm not 100% sure what would happen if we removed that
restriction; it might be that things would work just fine (just slower
in some workloads), or whether there is some hidden dependency that
would explode.  I suspect we'd have to try the experiment to be sure.

      		  	       	    - Ted