lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.11.1409292041540.4640@eggly.anvils>
Date:	Mon, 29 Sep 2014 21:32:20 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	Naoya Horiguchi <n-horiguchi@...jp.nec.com>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Hugh Dickins <hughd@...gle.com>,
	David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Naoya Horiguchi <nao.horiguchi@...il.com>,
	stable@...r.kernel.org
Subject: Re: [PATCH v3 2/5] mm/hugetlb: take page table lock in
 follow_huge_pmd()

On Mon, 15 Sep 2014, Naoya Horiguchi wrote:
> We have a race condition between move_pages() and freeing hugepages,

I've been looking through these 5 today, and they're much better now,
thank you.  But a new concern below, and a minor correction to 3/5.

> --- mmotm-2014-09-09-14-42.orig/mm/gup.c
> +++ mmotm-2014-09-09-14-42/mm/gup.c
> @@ -162,33 +162,16 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
>  
>  	pmd = pmd_offset(pud, address);
>  	if (pmd_none(*pmd))
>  		return no_page_table(vma, flags);
> -	if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
> -		page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
> -		if (flags & FOLL_GET) {
> -			/*
> -			 * Refcount on tail pages are not well-defined and
> -			 * shouldn't be taken. The caller should handle a NULL
> -			 * return when trying to follow tail pages.
> -			 */
> -			if (PageHead(page))
> -				get_page(page);
> -			else
> -				page = NULL;
> -		}
> -		return page;
> -	}
> +	if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB)
> +		return follow_huge_pmd(mm, address, pmd, flags);

This code here allows for pmd_none() and pmd_huge(), and for pmd_numa()
and pmd_trans_huge() below; but makes no explicit allowance for !present
migration and hwpoison entries.

Is it assumed that the pmd_bad() test in follow_page_pte() will catch
those?  But what of races? migration entries are highly volatile.  And
is it assumed that a migration entry cannot pass the pmd_huge() test?

That may be true of x86 today, I'm not certain; but if the soft-dirty
people catch up with the hugetlb-migration people, that might change
(they #define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE).

Why pmd_huge() does not itself test for present, I cannot say; but it
probably didn't matter at all before hwpoison and migration were added.

Mind you, with __get_user_pages()'s is_vm_hugtlb_page() test avoiding
all this code, maybe the only thing that can stumble here is your own
hugetlb migration code; but that appears to be guarded only by
down_read of mmap_sem, so races would be possible (if userspace
is silly enough or malicious enough to do so).

What we have here today looks too fragile to me, but it's probably
best dealt with by a separate patch.

Or I may be over-anxious, and there may be something "obvious"
that I'm missing, which saves us from further change.

>  	if ((flags & FOLL_NUMA) && pmd_numa(*pmd))
>  		return no_page_table(vma, flags);
>  	if (pmd_trans_huge(*pmd)) {
> diff --git mmotm-2014-09-09-14-42.orig/mm/hugetlb.c mmotm-2014-09-09-14-42/mm/hugetlb.c
> index 34351251e164..941832ee3d5a 100644
> --- mmotm-2014-09-09-14-42.orig/mm/hugetlb.c
> +++ mmotm-2014-09-09-14-42/mm/hugetlb.c
> @@ -3668,26 +3668,34 @@ follow_huge_addr(struct mm_struct *mm, unsigned long address,
>  
>  struct page * __weak
>  follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> -		pmd_t *pmd, int write)
> +		pmd_t *pmd, int flags)
>  {
> -	struct page *page;
> +	struct page *page = NULL;
> +	spinlock_t *ptl;
>  
> -	page = pte_page(*(pte_t *)pmd);
> -	if (page)
> -		page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
> +	ptl = pmd_lockptr(mm, pmd);
> +	spin_lock(ptl);
> +
> +	if (!pmd_huge(*pmd))
> +		goto out;

And similarly here.  Though at least here we now have the necessary
lock, so it's no longer racy, and maybe this pmd_huge() test just needs
to be replaced by a pmd_present() test?  Or are both needed?

> +
> +	page = pte_page(*(pte_t *)pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT);
> +
> +	if (flags & FOLL_GET)
> +		get_page(page);
> +out:
> +	spin_unlock(ptl);
>  	return page;
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ