linux-kernel - Re: [PATCHv6 32/36] thp: reintroduce split_huge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55785B5E.3000306@suse.cz>
Date:	Wed, 10 Jun 2015 17:44:30 +0200
From:	Vlastimil Babka <vbabka@...e.cz>
To:	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Hugh Dickins <hughd@...gle.com>
CC:	Dave Hansen <dave.hansen@...el.com>, Mel Gorman <mgorman@...e.de>,
	Rik van Riel <riel@...hat.com>,
	Christoph Lameter <cl@...two.org>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Steve Capper <steve.capper@...aro.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Hocko <mhocko@...e.cz>,
	Jerome Marchand <jmarchan@...hat.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCHv6 32/36] thp: reintroduce split_huge_page()

On 06/03/2015 07:06 PM, Kirill A. Shutemov wrote:
> This patch adds implementation of split_huge_page() for new
> refcountings.
>
> Unlike previous implementation, new split_huge_page() can fail if
> somebody holds GUP pin on the page. It also means that pin on page
> would prevent it from bening split under you. It makes situation in
> many places much cleaner.
>
> The basic scheme of split_huge_page():
>
>    - Check that sum of mapcounts of all subpage is equal to page_count()
>      plus one (caller pin). Foll off with -EBUSY. This way we can avoid
>      useless PMD-splits.
>
>    - Freeze the page counters by splitting all PMD and setup migration
>      PTEs.
>
>    - Re-check sum of mapcounts against page_count(). Page's counts are
>      stable now. -EBUSY if page is pinned.
>
>    - Split compound page.
>
>    - Unfreeze the page by removing migration entries.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> Tested-by: Sasha Levin <sasha.levin@...cle.com>

[...]

> +
> +static int __split_huge_page_tail(struct page *head, int tail,
> +		struct lruvec *lruvec, struct list_head *list)
> +{
> +	int mapcount;
> +	struct page *page_tail = head + tail;
> +
> +	mapcount = page_mapcount(page_tail);

Isn't page_mapcount() unnecessarily heavyweight here? When you are 
splitting a page, it already should have zero compound_mapcount() and 
shouldn't be PageDoubleMap(), no? So you should care about 
page->_mapcount only? Sure, splitting THP is not a hotpath, but when 
done 512 times per split, it could make some difference in the split's 
latency.

> +	VM_BUG_ON_PAGE(atomic_read(&page_tail->_count) != 0, page_tail);
> +
> +	/*
> +	 * tail_page->_count is zero and not changing from under us. But
> +	 * get_page_unless_zero() may be running from under us on the
> +	 * tail_page. If we used atomic_set() below instead of atomic_add(), we
> +	 * would then run atomic_set() concurrently with
> +	 * get_page_unless_zero(), and atomic_set() is implemented in C not
> +	 * using locked ops. spin_unlock on x86 sometime uses locked ops
> +	 * because of PPro errata 66, 92, so unless somebody can guarantee
> +	 * atomic_set() here would be safe on all archs (and not only on x86),
> +	 * it's safer to use atomic_add().

I would be surprised if this was the first place to use atomic_set() 
with potential concurrent atomic_add(). Shouldn't atomic_*() API 
guarantee that this works?

> +	 */
> +	atomic_add(page_mapcount(page_tail) + 1, &page_tail->_count);

You already have the value in mapcount variable, so why read it again.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/