lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20131014135642.05DFEE0090@blue.fi.intel.com>
Date:	Mon, 14 Oct 2013 16:56:41 +0300 (EEST)
From:	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Al Viro <viro@...iv.linux.org.uk>,
	Hugh Dickins <hughd@...gle.com>,
	Wu Fengguang <fengguang.wu@...el.com>, Jan Kara <jack@...e.cz>,
	Mel Gorman <mgorman@...e.de>, linux-mm@...ck.org,
	Andi Kleen <ak@...ux.intel.com>,
	Matthew Wilcox <willy@...ux.intel.com>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	Hillf Danton <dhillf@...il.com>, Dave Hansen <dave@...1.net>,
	Ning Qu <quning@...gle.com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everything
 but mmap()

Dave Chinner wrote:
> On Wed, Sep 25, 2013 at 12:51:04PM +0300, Kirill A. Shutemov wrote:
> > Andrew Morton wrote:
> > > On Mon, 23 Sep 2013 15:05:28 +0300 "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com> wrote:
> > > 
> > > > It brings thp support for ramfs, but without mmap() -- it will be posted
> > > > separately.
> > > 
> > > We were never going to do this :(
> > > 
> > > Has anyone reviewed these patches much yet?
> > 
> > Dave did very good review. Few other people looked to separate patches.
> > See Reviewed-by/Acked-by tags in patches.
> > 
> > It looks like most mm experts are busy with numa balancing nowadays, so
> > it's hard to get more review.
> 
> Nobody has reviewed it from the filesystem side, though.
> 
> The changes that require special code paths for huge pages in the
> write_begin/write_end paths are nasty. You're adding conditional
> code that depends on the page size and then having to add checks to
> ensure that large page operations don't step over small page
> boundaries and other such corner cases. It's an extremely fragile
> design, IMO.
> 
> In general, I don't like all the if (thp) {} else {}; code that this
> series introduces - they are code paths that simply won't get tested
> with any sort of regularity and make the code more complex for those
> that aren't using THP to understand and debug...

Okay, I'll try to get rid of special cases where it's possible.

> Then there is a new per-inode lock that is used in
> generic_perform_write() which is held across page faults and calls
> to filesystem block mapping callbacks. This inserts into the middle
> of an existing locking chain that needs to be strictly ordered, and
> as such will lead to the same type of lock inversion problems that
> the mmap_sem had.  We do not want to introduce a new lock that has
> this same problem just as we are getting rid of that long standing
> nastiness from the page fault path...

I don't see how we can protect against splitting with existing locks,
but I'll try find a way.

> I also note that you didn't convert invalidate_inode_pages2_range()
> to support huge pages which is needed by real filesystems that
> support direct IO. There are other truncate/invalidate interfaces
> that you didn't convert, either, and some of them will present you
> with interesting locking challenges as a result of adding that new
> lock...

Thanks. I'll take a look on these code paths.

> > The patchset was mostly ignored for few rounds and Dave suggested to split
> > to have less scary patch number.
> 
> It's still being ignored by filesystem people because you haven't
> actually tried to implement support into a real filesystem.....

If it will support a real filesystem, wouldn't it be ignored due
patch count? ;)

> > > > Please review and consider applying.
> > > 
> > > It appears rather too immature at this stage.
> > 
> > More review is always welcome and I'm committed to address issues.
> 
> IMO, supporting a real block based filesystem like ext4 or XFS and
> demonstrating that everything works is necessary before we go any
> further...

Will see what numbers I can bring in next iterations.

Thanks for your feedback. And sorry for late answer.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ