lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 26 Sep 2013 09:29:15 +1000
From:	Dave Chinner <david@...morbit.com>
To:	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Al Viro <viro@...iv.linux.org.uk>,
	Hugh Dickins <hughd@...gle.com>,
	Wu Fengguang <fengguang.wu@...el.com>, Jan Kara <jack@...e.cz>,
	Mel Gorman <mgorman@...e.de>, linux-mm@...ck.org,
	Andi Kleen <ak@...ux.intel.com>,
	Matthew Wilcox <willy@...ux.intel.com>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	Hillf Danton <dhillf@...il.com>, Dave Hansen <dave@...1.net>,
	Ning Qu <quning@...gle.com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everything
 but mmap()

On Wed, Sep 25, 2013 at 12:51:04PM +0300, Kirill A. Shutemov wrote:
> Andrew Morton wrote:
> > On Mon, 23 Sep 2013 15:05:28 +0300 "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com> wrote:
> > 
> > > It brings thp support for ramfs, but without mmap() -- it will be posted
> > > separately.
> > 
> > We were never going to do this :(
> > 
> > Has anyone reviewed these patches much yet?
> 
> Dave did very good review. Few other people looked to separate patches.
> See Reviewed-by/Acked-by tags in patches.
> 
> It looks like most mm experts are busy with numa balancing nowadays, so
> it's hard to get more review.

Nobody has reviewed it from the filesystem side, though.

The changes that require special code paths for huge pages in the
write_begin/write_end paths are nasty. You're adding conditional
code that depends on the page size and then having to add checks to
ensure that large page operations don't step over small page
boundaries and other such corner cases. It's an extremely fragile
design, IMO.

In general, I don't like all the if (thp) {} else {}; code that this
series introduces - they are code paths that simply won't get tested
with any sort of regularity and make the code more complex for those
that aren't using THP to understand and debug...

Then there is a new per-inode lock that is used in
generic_perform_write() which is held across page faults and calls
to filesystem block mapping callbacks. This inserts into the middle
of an existing locking chain that needs to be strictly ordered, and
as such will lead to the same type of lock inversion problems that
the mmap_sem had.  We do not want to introduce a new lock that has
this same problem just as we are getting rid of that long standing
nastiness from the page fault path...

I also note that you didn't convert invalidate_inode_pages2_range()
to support huge pages which is needed by real filesystems that
support direct IO. There are other truncate/invalidate interfaces
that you didn't convert, either, and some of them will present you
with interesting locking challenges as a result of adding that new
lock...

> The patchset was mostly ignored for few rounds and Dave suggested to split
> to have less scary patch number.

It's still being ignored by filesystem people because you haven't
actually tried to implement support into a real filesystem.....

> > > Please review and consider applying.
> > 
> > It appears rather too immature at this stage.
> 
> More review is always welcome and I'm committed to address issues.

IMO, supporting a real block based filesystem like ext4 or XFS and
demonstrating that everything works is necessary before we go any
further...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ