linux-ext4 - Re: [PATCH 0/5] ext4: direct IO via iomap infrastructure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190824230427.GA32012@infradead.org>
Date:   Sat, 24 Aug 2019 16:04:27 -0700
From:   Christoph Hellwig <hch@...radead.org>
To:     "Darrick J. Wong" <darrick.wong@...cle.com>
Cc:     Matthew Bobrowski <mbobrowski@...browski.org>,
        Ritesh Harjani <riteshh@...ux.ibm.com>, tytso@....edu,
        jack@...e.cz, adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, hch@...radead.org,
        aneesh.kumar@...ux.ibm.com
Subject: Re: [PATCH 0/5] ext4: direct IO via iomap infrastructure

On Fri, Aug 23, 2019 at 08:55:54PM -0700, Darrick J. Wong wrote:
> I'm probably misunderstanding the ext4 extent cache horribly, but I keep
> wondering why any of this is necessary -- why can't ext4 track the
> unwritten status in the extent records directly?  And why is there all
> this strange "can merge" logic?  If you need to convert blocks X to Y
> to written state because a write to those blocks completed, isn't that
> just manipulation of a bunch of incore records?  And can't you just seek
> back and forth in the extent cache to look for adjacent records to merge
> with? <confuseD>

Same here.  I'm not an ext4 expert, but here is what we do in XFS, which
hopefully works in some form for ext4 a well:

 - when starting a direct I/O we allocate any needed blocks and do so
   as unwritten extent.  The extent tree code will merge them in
   whatever way that seems suitable
 - if the IOMAP_DIO_UNWRITTEN is set on the iomap at ->end_io time we
   call a function that walks the whole range covered by the ioend,
   and convert any unwritten extent to a normal written extent.  Any
   splitting and merging will be done as needed by the low-level
   extent tree code
 - this also means we don't need the xfs_ioen structure (which ext4)
   copied from for direct I/O at all (we used to have it initially,
   though including the time when ext4 copied this code).
 - we don't need the equivalent to the ext4_unwritten_wait call in
   ext4_file_write_iter because we serialize any non-aligned I/O
   instead of trying to optimize for weird corner cases


> (I'd really prefer not to go adding private fields all over the
> place...)

Agreed.