lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181130233201.6yuzbhymtjddvf3u@ca-dmjordan1.us.oracle.com>
Date:   Fri, 30 Nov 2018 15:32:01 -0800
From:   Daniel Jordan <daniel.m.jordan@...cle.com>
To:     Huang Ying <ying.huang@...el.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Michal Hocko <mhocko@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Shaohua Li <shli@...nel.org>, Hugh Dickins <hughd@...gle.com>,
        Minchan Kim <minchan@...nel.org>,
        Rik van Riel <riel@...hat.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        Zi Yan <zi.yan@...rutgers.edu>,
        Daniel Jordan <daniel.m.jordan@...cle.com>
Subject: Re: [PATCH -V7 RESEND 08/21] swap: Support to read a huge swap
 cluster for swapin a THP

Hi Ying,

On Tue, Nov 20, 2018 at 04:54:36PM +0800, Huang Ying wrote:
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 97831166994a..1eedbc0aede2 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -387,14 +389,42 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>  		 * as SWAP_HAS_CACHE.  That's done in later part of code or
>  		 * else swap_off will be aborted if we return NULL.
>  		 */
> -		if (!__swp_swapcount(entry) && swap_slot_cache_enabled)
> +		if (!__swp_swapcount(entry, &entry_size) &&
> +		    swap_slot_cache_enabled)
>  			break;
>  
>  		/*
>  		 * Get a new page to read into from swap.
>  		 */
> -		if (!new_page) {
> -			new_page = alloc_page_vma(gfp_mask, vma, addr);
> +		if (!new_page ||
> +		    (IS_ENABLED(CONFIG_THP_SWAP) &&
> +		     hpage_nr_pages(new_page) != entry_size)) {
> +			if (new_page)
> +				put_page(new_page);
> +			if (IS_ENABLED(CONFIG_THP_SWAP) &&
> +			    entry_size == HPAGE_PMD_NR) {
> +				gfp_t gfp;
> +
> +				gfp = alloc_hugepage_direct_gfpmask(vma, addr);

vma is NULL when we get here from try_to_unuse, so the kernel will die on
vma->flags inside alloc_hugepage_direct_gfpmask.

try_to_unuse swaps in before it finds vma's, but even if those were reversed,
it seems try_to_unuse wouldn't always have a single vma to pass into this path
since it's walking the swap_map and multiple processes mapping the same huge
page can have different huge page advice (and maybe mempolicies?), affecting
the result of alloc_hugepage_direct_gfpmask.  And yet
alloc_hugepage_direct_gfpmask needs a vma to do its job.  So, I'm not sure how
to fix this.

If the entry's usage count were 1, we could find the vma in that common case to
give read_swap_cache_async, and otherwise allocate small pages.  We'd have THPs
some of the time and be exactly following alloc_hugepage_direct_gfpmask, but
would also be conservative when it's uncertain.

Or, if the system-wide THP settings allow it then go for it, but otherwise
ignore vma hints and always fall back to small pages.  This requires another
way of controlling THP allocations besides alloc_hugepage_direct_gfpmask.

Or maybe try_to_unuse shouldn't allocate hugepages at all, but then no perf
improvement for try_to_unuse.

What do you think?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ