linux-kernel - Re: [PATCHv2, RFC 20/30] ramfs: enable transparent huge page cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130405083112.GD32126@blaptop>
Date:	Fri, 5 Apr 2013 17:31:12 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Wanpeng Li <liwanp@...ux.vnet.ibm.com>
Cc:	Simon Jeons <simon.jeons@...il.com>,
	Hugh Dickins <hughd@...gle.com>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	Wu Fengguang <fengguang.wu@...el.com>, Jan Kara <jack@...e.cz>,
	Mel Gorman <mgorman@...e.de>, linux-mm@...ck.org,
	Andi Kleen <ak@...ux.intel.com>,
	Matthew Wilcox <matthew.r.wilcox@...el.com>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	Hillf Danton <dhillf@...il.com>, Ying Han <yinghan@...gle.com>,
	Christoph Lameter <cl@...ux.com>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv2, RFC 20/30] ramfs: enable transparent huge page cache

Hi Wanpeng,

On Fri, Apr 05, 2013 at 04:22:17PM +0800, Wanpeng Li wrote:
> On Fri, Apr 05, 2013 at 05:01:06PM +0900, Minchan Kim wrote:
> >On Fri, Apr 05, 2013 at 02:47:25PM +0800, Simon Jeons wrote:
> >> Hi Minchan,
> >> On 04/03/2013 09:11 AM, Minchan Kim wrote:
> >> >On Tue, Apr 02, 2013 at 03:15:23PM -0700, Hugh Dickins wrote:
> >> >>On Tue, 2 Apr 2013, Kirill A. Shutemov wrote:
> >> >>>Kirill A. Shutemov wrote:
> >> >>>>From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
> >> >>>>
> >> >>>>ramfs is the most simple fs from page cache point of view. Let's start
> >> >>>>transparent huge page cache enabling here.
> >> >>>>
> >> >>>>For now we allocate only non-movable huge page. It's not yet clear if
> >> >>>>movable page is safe here and what need to be done to make it safe.
> >> >>>>
> >> >>>>Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> >> >>>>---
> >> >>>>  fs/ramfs/inode.c |    6 +++++-
> >> >>>>  1 file changed, 5 insertions(+), 1 deletion(-)
> >> >>>>
> >> >>>>diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
> >> >>>>index c24f1e1..da30b4f 100644
> >> >>>>--- a/fs/ramfs/inode.c
> >> >>>>+++ b/fs/ramfs/inode.c
> >> >>>>@@ -61,7 +61,11 @@ struct inode *ramfs_get_inode(struct super_block *sb,
> >> >>>>  		inode_init_owner(inode, dir, mode);
> >> >>>>  		inode->i_mapping->a_ops = &ramfs_aops;
> >> >>>>  		inode->i_mapping->backing_dev_info = &ramfs_backing_dev_info;
> >> >>>>-		mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
> >> >>>>+		/*
> >> >>>>+		 * TODO: what should be done to make movable safe?
> >> >>>>+		 */
> >> >>>>+		mapping_set_gfp_mask(inode->i_mapping,
> >> >>>>+				GFP_TRANSHUGE & ~__GFP_MOVABLE);
> >> >>>Hugh, I've found old thread with the reason why we have GFP_HIGHUSER here, not
> >> >>>GFP_HIGHUSER_MOVABLE:
> >> >>>
> >> >>>http://lkml.org/lkml/2006/11/27/156
> >> >>>
> >> >>>It seems the origin reason is not longer valid, correct?
> >> >>Incorrect, I believe: so far as I know, the original reason remains
> >> >>valid - though it would only require a couple of good small changes
> >> >>to reverse that - or perhaps you have already made these changes?
> >> >>
> >> >>The original reason is that ramfs pages are not migratable,
> >> >>therefore they should be allocated from an unmovable area.
> >> >>
> >> >>As I understand it (and I would have preferred to run a test to check
> >> >>my understanding before replying, but don't have time for that), ramfs
> >> >>pages cannot be migrated for two reasons, neither of them a good reason.
> >> >>
> >> >>One reason (okay, it wouldn't have been quite this way in 2006) is that
> >> >>ramfs (rightly) calls mapping_set_unevictable(), so its pages will fail
> >> >>the page_evictable() test, so they will be marked PageUnevictable, so
> >> >>__isolate_lru_page() will refuse to isolate them for migration (except
> >> >>for CMA).
> >> >True.
> >> >
> >> >>I am strongly in favour of removing that limitation from
> >> >>__isolate_lru_page() (and the thread you pointed - thank you - shows Mel
> >> >>and Christoph were both in favour too); and note that there is no such
> >> >>restriction in the confusingly similar but different isolate_lru_page().
> >> >>
> >> >>Some people do worry that migrating Mlocked pages would introduce the
> >> >>occasional possibility of a minor fault (with migration_entry_wait())
> >> >>on an Mlocked region which never faulted before.  I tend to dismiss
> >> >>that worry, but maybe I'm wrong to do so: maybe there should be a
> >> >>tunable for realtimey people to set, to prohibit page migration from
> >> >>mlocked areas; but the default should be to allow it.
> >> >I agree.
> >> >Just FYI for mlocked page migration
> >> >
> >> >I tried migratioin of mlocked page and Johannes and Mel had a concern
> >> >about that.
> >> >http://lkml.indiana.edu/hypermail/linux/kernel/1109.0/00175.html
> >> >
> >> >But later, Peter already acked it and I guess by reading the thread that
> >> >Hugh was in favour when page migration was merged first time.
> >> >
> >> >http://marc.info/?l=linux-mm&m=133697873414205&w=2
> >> >http://marc.info/?l=linux-mm&m=133700341823358&w=2
> >> >
> >> >Many people said mlock means memory-resident, NOT pinning so it could
> >> >allow minor fault while Mel still had a concern except CMA.
> >> >http://marc.info/?l=linux-mm&m=133674219714419&w=2
> >> 
> >> How about add a knob?
> >
> >Maybe, volunteering?
> 
> Hi Minchan,
> 
> I can be the volunteer, what I care is if add a knob make sense?

Frankly sepaking, I'd like to avoid new knob but there might be
some workloads suffered from mlocked page migration so we coudn't
dismiss it. In such case, introducing the knob would be a solution
with default enabling. If we don't have any report for a long time,
we can remove the knob someday, IMHO.

Thanks.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/