[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20131014135642.05DFEE0090@blue.fi.intel.com>
Date: Mon, 14 Oct 2013 16:56:41 +0300 (EEST)
From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To: Dave Chinner <david@...morbit.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Al Viro <viro@...iv.linux.org.uk>,
Hugh Dickins <hughd@...gle.com>,
Wu Fengguang <fengguang.wu@...el.com>, Jan Kara <jack@...e.cz>,
Mel Gorman <mgorman@...e.de>, linux-mm@...ck.org,
Andi Kleen <ak@...ux.intel.com>,
Matthew Wilcox <willy@...ux.intel.com>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Hillf Danton <dhillf@...il.com>, Dave Hansen <dave@...1.net>,
Ning Qu <quning@...gle.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everything
but mmap()
Dave Chinner wrote:
> On Wed, Sep 25, 2013 at 12:51:04PM +0300, Kirill A. Shutemov wrote:
> > Andrew Morton wrote:
> > > On Mon, 23 Sep 2013 15:05:28 +0300 "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com> wrote:
> > >
> > > > It brings thp support for ramfs, but without mmap() -- it will be posted
> > > > separately.
> > >
> > > We were never going to do this :(
> > >
> > > Has anyone reviewed these patches much yet?
> >
> > Dave did very good review. Few other people looked to separate patches.
> > See Reviewed-by/Acked-by tags in patches.
> >
> > It looks like most mm experts are busy with numa balancing nowadays, so
> > it's hard to get more review.
>
> Nobody has reviewed it from the filesystem side, though.
>
> The changes that require special code paths for huge pages in the
> write_begin/write_end paths are nasty. You're adding conditional
> code that depends on the page size and then having to add checks to
> ensure that large page operations don't step over small page
> boundaries and other such corner cases. It's an extremely fragile
> design, IMO.
>
> In general, I don't like all the if (thp) {} else {}; code that this
> series introduces - they are code paths that simply won't get tested
> with any sort of regularity and make the code more complex for those
> that aren't using THP to understand and debug...
Okay, I'll try to get rid of special cases where it's possible.
> Then there is a new per-inode lock that is used in
> generic_perform_write() which is held across page faults and calls
> to filesystem block mapping callbacks. This inserts into the middle
> of an existing locking chain that needs to be strictly ordered, and
> as such will lead to the same type of lock inversion problems that
> the mmap_sem had. We do not want to introduce a new lock that has
> this same problem just as we are getting rid of that long standing
> nastiness from the page fault path...
I don't see how we can protect against splitting with existing locks,
but I'll try find a way.
> I also note that you didn't convert invalidate_inode_pages2_range()
> to support huge pages which is needed by real filesystems that
> support direct IO. There are other truncate/invalidate interfaces
> that you didn't convert, either, and some of them will present you
> with interesting locking challenges as a result of adding that new
> lock...
Thanks. I'll take a look on these code paths.
> > The patchset was mostly ignored for few rounds and Dave suggested to split
> > to have less scary patch number.
>
> It's still being ignored by filesystem people because you haven't
> actually tried to implement support into a real filesystem.....
If it will support a real filesystem, wouldn't it be ignored due
patch count? ;)
> > > > Please review and consider applying.
> > >
> > > It appears rather too immature at this stage.
> >
> > More review is always welcome and I'm committed to address issues.
>
> IMO, supporting a real block based filesystem like ext4 or XFS and
> demonstrating that everything works is necessary before we go any
> further...
Will see what numbers I can bring in next iterations.
Thanks for your feedback. And sorry for late answer.
--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists