lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 22 Sep 2016 14:31:43 +0200
From:   Jan Kara <jack@...e.cz>
To:     Theodore Ts'o <tytso@....edu>
Cc:     Christoph Hellwig <hch@...radead.org>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] ext4: optimize ext4 direct I/O locking for reading

On Wed 21-09-16 10:37:48, Ted Tso wrote:
> On Wed, Sep 21, 2016 at 06:26:09AM -0700, Christoph Hellwig wrote:
> > Is there any chance you could look into simplifying the locking instead
> > of making it even more complicated?  Since Al replaced i_mutex with
> > i_rwsem you can now easily take that in shared mode.  E.g. if you'd
> > move to a direct I/O model closer to XFS, ocfs2 and NFS where you always
> > take i_rwsem in shared mode you'll get the scalibility of the
> > dioread_nolock case while no having to do these crazy dances, and you
> > can also massively simplify the codebase.  Similarly you can demote it
> > from exclusive to shared after allocating blocks in the write path,
> > and you'll end up with something way easier to understand.
> 
> Unfortunately, in order to do this we need to extend the
> dioread_nolock handling for sub-page block sizes.  (This is where we
> insert the allocated blocks into the extent maps marked uninitialized,
> and only converting the extent from uninitialized to initialized ---
> which today only works when the page size == block size.)
> 
> This is on my todo list, but half of the problem is the mess caused by
> needing to iterate over the circularly linked buffer heads when there
> are multiple buffer heads covering the page.  I was originally
> assuming that it would be easier to fix this after doing the bh ->
> iomap conversion, but it's in a while before I looked into this
> particular change.  I can try to take a closer look again....
> 
> The main reason why I looked into this hack --- and I will be the
> first to agree it was a hack, is because I had a request to support
> the dioread_nolock scalability on a Little Endian PowerPC system which
> has 64k page sizes.
> 
> > Sorry for the rant, but I just had to dig into this code when looking
> > at converting ext4 to the new DAX path, and my eyes still bleed..
> 
> Yeah, I know, and I'm sorry.  There's quite a bit of technical debt
> there, which I do want to clean up.

So I think what Christoph meant in this case is something like attached
patch. That achieves more than your dirty hack in a much cleaner way.
Beware, the patch is only compile-tested.

Then there is the case of unlocked direct IO overwrites which we allow to
run without inode_lock in dioread_nolock mode as well and that is more
difficult to resolve (there lay the problems with blocksize < pagesize you
speak about).

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

View attachment "0001-ext4-Allow-parallel-DIO-reads.patch" of type "text/x-patch" (2656 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ