[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160922123143.GO2834@quack2.suse.cz>
Date: Thu, 22 Sep 2016 14:31:43 +0200
From: Jan Kara <jack@...e.cz>
To: Theodore Ts'o <tytso@....edu>
Cc: Christoph Hellwig <hch@...radead.org>,
Ext4 Developers List <linux-ext4@...r.kernel.org>,
Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] ext4: optimize ext4 direct I/O locking for reading
On Wed 21-09-16 10:37:48, Ted Tso wrote:
> On Wed, Sep 21, 2016 at 06:26:09AM -0700, Christoph Hellwig wrote:
> > Is there any chance you could look into simplifying the locking instead
> > of making it even more complicated? Since Al replaced i_mutex with
> > i_rwsem you can now easily take that in shared mode. E.g. if you'd
> > move to a direct I/O model closer to XFS, ocfs2 and NFS where you always
> > take i_rwsem in shared mode you'll get the scalibility of the
> > dioread_nolock case while no having to do these crazy dances, and you
> > can also massively simplify the codebase. Similarly you can demote it
> > from exclusive to shared after allocating blocks in the write path,
> > and you'll end up with something way easier to understand.
>
> Unfortunately, in order to do this we need to extend the
> dioread_nolock handling for sub-page block sizes. (This is where we
> insert the allocated blocks into the extent maps marked uninitialized,
> and only converting the extent from uninitialized to initialized ---
> which today only works when the page size == block size.)
>
> This is on my todo list, but half of the problem is the mess caused by
> needing to iterate over the circularly linked buffer heads when there
> are multiple buffer heads covering the page. I was originally
> assuming that it would be easier to fix this after doing the bh ->
> iomap conversion, but it's in a while before I looked into this
> particular change. I can try to take a closer look again....
>
> The main reason why I looked into this hack --- and I will be the
> first to agree it was a hack, is because I had a request to support
> the dioread_nolock scalability on a Little Endian PowerPC system which
> has 64k page sizes.
>
> > Sorry for the rant, but I just had to dig into this code when looking
> > at converting ext4 to the new DAX path, and my eyes still bleed..
>
> Yeah, I know, and I'm sorry. There's quite a bit of technical debt
> there, which I do want to clean up.
So I think what Christoph meant in this case is something like attached
patch. That achieves more than your dirty hack in a much cleaner way.
Beware, the patch is only compile-tested.
Then there is the case of unlocked direct IO overwrites which we allow to
run without inode_lock in dioread_nolock mode as well and that is more
difficult to resolve (there lay the problems with blocksize < pagesize you
speak about).
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
View attachment "0001-ext4-Allow-parallel-DIO-reads.patch" of type "text/x-patch" (2656 bytes)
Powered by blists - more mailing lists