lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190827095221.GA1568@poseidon.bobrowski.net>
Date:   Tue, 27 Aug 2019 19:52:23 +1000
From:   Matthew Bobrowski <mbobrowski@...browski.org>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     "Darrick J. Wong" <darrick.wong@...cle.com>,
        Ritesh Harjani <riteshh@...ux.ibm.com>, tytso@....edu,
        jack@...e.cz, adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, aneesh.kumar@...ux.ibm.com
Subject: Re: [PATCH 0/5] ext4: direct IO via iomap infrastructure

On Sat, Aug 24, 2019 at 04:04:27PM -0700, Christoph Hellwig wrote:
> On Fri, Aug 23, 2019 at 08:55:54PM -0700, Darrick J. Wong wrote:
> > I'm probably misunderstanding the ext4 extent cache horribly, but I keep
> > wondering why any of this is necessary -- why can't ext4 track the
> > unwritten status in the extent records directly?  And why is there all
> > this strange "can merge" logic?  If you need to convert blocks X to Y
> > to written state because a write to those blocks completed, isn't that
> > just manipulation of a bunch of incore records?  And can't you just seek
> > back and forth in the extent cache to look for adjacent records to merge
> > with? <confuseD>
> 
> Same here.  I'm not an ext4 expert, but here is what we do in XFS, which
> hopefully works in some form for ext4 a well:
> 
>  - when starting a direct I/O we allocate any needed blocks and do so
>    as unwritten extent.  The extent tree code will merge them in
>    whatever way that seems suitable
>  - if the IOMAP_DIO_UNWRITTEN is set on the iomap at ->end_io time we
>    call a function that walks the whole range covered by the ioend,
>    and convert any unwritten extent to a normal written extent.  Any
>    splitting and merging will be done as needed by the low-level
>    extent tree code
>  - this also means we don't need the xfs_ioen structure (which ext4)
>    copied from for direct I/O at all (we used to have it initially,
>    though including the time when ext4 copied this code).
>  - we don't need the equivalent to the ext4_unwritten_wait call in
>    ext4_file_write_iter because we serialize any non-aligned I/O
>    instead of trying to optimize for weird corner cases

Yeah, so what you've detailed above is essentially the approach I've
taken in my patch series...

What is not clear to me at this point though is whether it is still
necessary to explicitly track unwritten extents via in-core inode
attributes i.e. ->i_unwritten and ->i_state_flags under the new direct
IO code path implementation, which makes use of the iomap
infrastructure. Or, whether we can get away with simply not using
these in-core inode attributes and rely just on checks against the
extent record directly, as breifly mentioned by Darrick. I would think
that this type of check would be enough, however the checks around
whether the inode is currently undergoing direct IO were implemented
at some point, so there must be a reason for having them
(a9b8241594add)?

--M

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ