linux-kernel - Re: [PATCHv2, RFC 13/30] thp, mm: implement grab_cache_huge_page_write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <514B4E2B.2010506@sr71.net>
Date:	Thu, 21 Mar 2013 11:15:07 -0700
From:	Dave Hansen <dave@...1.net>
To:	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
CC:	Andrea Arcangeli <aarcange@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	Hugh Dickins <hughd@...gle.com>,
	Wu Fengguang <fengguang.wu@...el.com>, Jan Kara <jack@...e.cz>,
	Mel Gorman <mgorman@...e.de>, linux-mm@...ck.org,
	Andi Kleen <ak@...ux.intel.com>,
	Matthew Wilcox <matthew.r.wilcox@...el.com>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	Hillf Danton <dhillf@...il.com>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCHv2, RFC 13/30] thp, mm: implement grab_cache_huge_page_write_begin()

On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
> 
> The function is grab_cache_page_write_begin() twin but it tries to
> allocate huge page at given position aligned to HPAGE_CACHE_NR.

The obvious question, then, is whether we should just replace
grab_cache_page_write_begin() with this code and pass in HPAGE_CACHE_NR
or 1 based on whether we're doing a huge or normal page.

> diff --git a/mm/filemap.c b/mm/filemap.c
> index 38fdc92..bdedb1b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2332,6 +2332,64 @@ found:
>  }
>  EXPORT_SYMBOL(grab_cache_page_write_begin);
>  
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +/*
> + * Find or create a huge page at the given pagecache position, aligned to
> + * HPAGE_CACHE_NR. Return the locked huge page.
> + *
> + * If, for some reason, it's not possible allocate a huge page at this
> + * possition, it returns NULL. Caller should take care of fallback to small
> + * pages.
> + *
> + * This function is specifically for buffered writes.
> + */
> +struct page *grab_cache_huge_page_write_begin(struct address_space *mapping,
> +		pgoff_t index, unsigned flags)
> +{
> +	int status;
> +	gfp_t gfp_mask;
> +	struct page *page;
> +	gfp_t gfp_notmask = 0;
> +
> +	BUG_ON(index & HPAGE_CACHE_INDEX_MASK);

--
> +	gfp_mask = mapping_gfp_mask(mapping);
> +	BUG_ON(!(gfp_mask & __GFP_COMP));
> +	if (mapping_cap_account_dirty(mapping))
> +		gfp_mask |= __GFP_WRITE;
> +	if (flags & AOP_FLAG_NOFS)
> +		gfp_notmask = __GFP_FS;

This whole hunk is both non-obvious and copy-n-pasted from
grab_cache_page_write_begin().  That makes me worry that bugs/features
will get added/removed in one and not the other.  I really think they
need to get consolidated somehow.

> +repeat:
> +	page = find_lock_page(mapping, index);
> +	if (page) {
> +		if (!PageTransHuge(page)) {
> +			unlock_page(page);
> +			page_cache_release(page);
> +			return NULL;
> +		}
> +		goto found;
> +	}
> +
> +	page = alloc_pages(gfp_mask & ~gfp_notmask, HPAGE_PMD_ORDER);

I alluded to this a second ago, but what's wrong with alloc_hugepage()?

> +	if (!page) {
> +		count_vm_event(THP_WRITE_FAILED);
> +		return NULL;
> +	}
> +
> +	count_vm_event(THP_WRITE_ALLOC);
> +	status = add_to_page_cache_lru(page, mapping, index,
> +			GFP_KERNEL & ~gfp_notmask);
> +	if (unlikely(status)) {
> +		page_cache_release(page);
> +		if (status == -EEXIST)
> +			goto repeat;
> +		return NULL;
> +	}

I'm rather un-fond of sprinking likely/unlikelies around.  But, I guess
this is really just copied from the existing one.  <sigh>

> +found:
> +	wait_on_page_writeback(page);
> +	return page;
> +}
> +#endif

So, I diffed :

-struct page *grab_cache_page_write_begin(struct address_space
vs.
+struct page *grab_cache_huge_page_write_begin(struct address_space

They're just to similar to ignore.  Please consolidate them somehow.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/