linux-kernel - Re: [PATCH 23/31] huge tmpfs recovery: framework for reconstituting huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 6 Apr 2016 19:05:20 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	Mika Penttila <mika.penttila@...tfour.com>
cc:	Hugh Dickins <hughd@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Andres Lagar-Cavilla <andreslc@...gle.com>,
	Yang Shi <yang.shi@...aro.org>, Ning Qu <quning@...il.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 23/31] huge tmpfs recovery: framework for reconstituting
 huge pages

On Wed, 6 Apr 2016, Mika Penttila wrote:
> On 04/06/2016 12:53 AM, Hugh Dickins wrote:
> > +static void shmem_recovery_work(struct work_struct *work)
...
> > +
> > +	if (head) {
> > +		/* We are resuming work from a previous partial recovery */
> > +		if (PageTeam(page))
> > +			shr_stats(resume_teamed);
> > +		else
> > +			shr_stats(resume_tagged);
> > +	} else {
> > +		gfp_t gfp = mapping_gfp_mask(mapping);
> > +		/*
> > +		 * XXX: Note that with swapin readahead, page_to_nid(page) will
> > +		 * often choose an unsuitable NUMA node: something to fix soon,
> > +		 * but not an immediate blocker.
> > +		 */
> > +		head = __alloc_pages_node(page_to_nid(page),
> > +			gfp | __GFP_NOWARN | __GFP_THISNODE, HPAGE_PMD_ORDER);
> > +		if (!head) {
> > +			shr_stats(huge_failed);
> > +			error = -ENOMEM;
> > +			goto out;
> > +		}
> 
> Should this head marked PageTeam? Because in patch 27/31 when given as a hint to shmem_getpage_gfp() :
> 
>  		hugehint = NULL;
> +		if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> +		    sgp == SGP_TEAM && *pagep) {
> +			struct page *head;
> +
> +			if (!get_page_unless_zero(*pagep)) {
> +				error = -ENOENT;
> +				goto decused;
> +			}
> +			page = *pagep;
> +			lock_page(page);
> +			head = page - (index & (HPAGE_PMD_NR-1));     
> 
> we fail always because :
> +			if (!PageTeam(head)) {
> +				error = -ENOENT;
> +				goto decused;
> +			}

Great observation, thank you Mika.

We don't fail always, because in most cases the page wanted for the head
will either be already in memory, or read in from swap, and that SGP_TEAM
block in shmem_getpage_gfp() (with the -ENOENT you show) not come into play
on it: then shmem_recovery_populate() does its !recovery->exposed_team
SetPageTeam(head) and all is well from then on.

But I think what you point out means that the current recovery code is
incapable of assembling a hugepage if its first page was not already
instantiated earlier: not something I'd realized until you showed me.
Not a common failing, and would never be the case for an extent which had
been mapped huge in the past, but it's certainly not what I'd intended.

As to whether the head should be marked PageTeam immediately after the
hugepage allocation: I think not, especially because of the swapin case
(26/31).  Swapin may need to read data from disk into that head page,
and I've never had to think about the consequences of having a swap
page marked PageTeam.  Perhaps it would work out okay, but I'd prefer
not to go there.

At this moment I'm too tired to think what the right answer will be,
and certainly won't be able to commit to any without some testing.

So, not as incapacitating as perhaps you thought, and not any danger
to people trying out huge tmpfs, but definitely something to be fixed:
I'll mull it over in the background and let you know when I'm sure.

Thank you again,
Hugh