linux-kernel - Re: [PATCH] thp: close race between split and zap huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20140416131942.aaf8e560e45062c9857a2648@linux-foundation.org>
Date:	Wed, 16 Apr 2014 13:19:42 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc:	Andrea Arcangeli <aarcange@...hat.com>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	Michel Lespinasse <walken@...gle.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	Dave Jones <davej@...hat.com>,
	Vlastimil Babka <vbabka@...e.cz>, Bob Liu <lliubbo@...il.com>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	<stable@...r.kernel.org>
Subject: Re: [PATCH] thp: close race between split and zap huge pages

On Wed, 16 Apr 2014 00:48:35 +0300 "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com> wrote:

> Sasha Levin has reported two THP BUGs[1][2]. I believe both of them have
> the same root cause. Let's look to them one by one.
> 
> The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!".
> It's BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page().
> >From my testing I see that page_mapcount() is higher than mapcount here.
> 
> I think it happens due to race between zap_huge_pmd() and
> page_check_address_pmd(). page_check_address_pmd() misses PMD
> which is under zap:

Why did this bug happen?

In other words, what earlier mistakes had we made which led to you
getting this locking wrong?  

Based on that knowledge, what can we do to reduce the likelihood of
such mistakes being made in the future?  (Hint: the answer to this
will involve making changes to this patch).

> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1536,16 +1536,23 @@ pmd_t *page_check_address_pmd(struct page *page,
>  			      enum page_check_address_pmd_flag flag,
>  			      spinlock_t **ptl)
>  {
> +	pgd_t *pgd;
> +	pud_t *pud;
>  	pmd_t *pmd;
>  
>  	if (address & ~HPAGE_PMD_MASK)
>  		return NULL;
>  
> -	pmd = mm_find_pmd(mm, address);
> -	if (!pmd)
> +	pgd = pgd_offset(mm, address);
> +	if (!pgd_present(*pgd))
>  		return NULL;
> +	pud = pud_offset(pgd, address);
> +	if (!pud_present(*pud))
> +		return NULL;
> +	pmd = pmd_offset(pud, address);
> +
>  	*ptl = pmd_lock(mm, pmd);
> -	if (pmd_none(*pmd))
> +	if (!pmd_present(*pmd))
>  		goto unlock;
>  	if (pmd_page(*pmd) != page)
>  		goto unlock;

So how do other callers of mm_find_pmd() manage to avoid this race, or
are they all buggy?

Is mm_find_pmd() really so simple and obvious that we can afford to
leave it undocumented?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/