linux-kernel - Re: [PATCH] thp: close race between split and zap huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA_GA1ecVD2GuxvPqBhGKdUfMeBJU+m-i5XeSzMmDXy=QncLqA@mail.gmail.com>
Date:	Wed, 16 Apr 2014 07:52:29 +0800
From:	Bob Liu <lliubbo@...il.com>
To:	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc:	Andrea Arcangeli <aarcange@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	Michel Lespinasse <walken@...gle.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	Dave Jones <davej@...hat.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	Linux-MM <linux-mm@...ck.org>,
	Linux-Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] thp: close race between split and zap huge pages

On Wed, Apr 16, 2014 at 5:48 AM, Kirill A. Shutemov
<kirill.shutemov@...ux.intel.com> wrote:
> Sasha Levin has reported two THP BUGs[1][2]. I believe both of them have
> the same root cause. Let's look to them one by one.
>
> The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!".
> It's BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page().
> From my testing I see that page_mapcount() is higher than mapcount here.
>
> I think it happens due to race between zap_huge_pmd() and
> page_check_address_pmd(). page_check_address_pmd() misses PMD
> which is under zap:
>

Nice catch!

>         CPU0                                            CPU1
>                                                 zap_huge_pmd()
>                                                   pmdp_get_and_clear()
> __split_huge_page()
>   anon_vma_interval_tree_foreach()
>     __split_huge_page_splitting()
>       page_check_address_pmd()
>         mm_find_pmd()
>           /*
>            * We check if PMD present without taking ptl: no
>            * serialization against zap_huge_pmd(). We miss this PMD,
>            * it's not accounted to 'mapcount' in __split_huge_page().
>            */
>           pmd_present(pmd) == 0
>
>   BUG_ON(mapcount != page_mapcount(page)) // CRASH!!!
>
>                                                   page_remove_rmap(page)
>                                                     atomic_add_negative(-1, &page->_mapcount)
>
> The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!".
> It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd().
>
> This happens in similar way:
>
>         CPU0                                            CPU1
>                                                 zap_huge_pmd()
>                                                   pmdp_get_and_clear()
>                                                   page_remove_rmap(page)
>                                                     atomic_add_negative(-1, &page->_mapcount)
> __split_huge_page()
>   anon_vma_interval_tree_foreach()
>     __split_huge_page_splitting()
>       page_check_address_pmd()
>         mm_find_pmd()
>           pmd_present(pmd) == 0 /* The same comment as above */
>   /*
>    * No crash this time since we already decremented page->_mapcount in
>    * zap_huge_pmd().
>    */
>   BUG_ON(mapcount != page_mapcount(page))
>
>   /*
>    * We split the compound page here into small pages without
>    * serialization against zap_huge_pmd()
>    */
>   __split_huge_page_refcount()
>                                                 VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!!
>
> So my understanding the problem is pmd_present() check in mm_find_pmd()
> without taking page table lock.
>
> The bug was introduced by me commit with commit 117b0791ac42. Sorry for
> that. :(
>
> Let's open code mm_find_pmd() in page_check_address_pmd() and do the
> check under page table lock.
>
> Note that __page_check_address() does the same for PTE entires
> if sync != 0.
>
> I've stress tested split and zap code paths for 36+ hours by now and
> don't see crashes with the patch applied. Before it took <20 min to
> trigger the first bug and few hours for second one (if we ignore
> first).
>
> [1] https://lkml.kernel.org/g/<53440991.9090001@...cle.com>
> [2] https://lkml.kernel.org/g/<5310C56C.60709@...cle.com>
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> Reported-by: Sasha Levin <sasha.levin@...cle.com>
> Cc: <stable@...r.kernel.org> #3.13+
> ---
>  mm/huge_memory.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 5025709bb3b5..d02a83852ee9 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1536,16 +1536,23 @@ pmd_t *page_check_address_pmd(struct page *page,
>                               enum page_check_address_pmd_flag flag,
>                               spinlock_t **ptl)
>  {
> +       pgd_t *pgd;
> +       pud_t *pud;
>         pmd_t *pmd;
>
>         if (address & ~HPAGE_PMD_MASK)
>                 return NULL;
>
> -       pmd = mm_find_pmd(mm, address);
> -       if (!pmd)
> +       pgd = pgd_offset(mm, address);
> +       if (!pgd_present(*pgd))
>                 return NULL;
> +       pud = pud_offset(pgd, address);
> +       if (!pud_present(*pud))
> +               return NULL;
> +       pmd = pmd_offset(pud, address);
> +
>         *ptl = pmd_lock(mm, pmd);
> -       if (pmd_none(*pmd))
> +       if (!pmd_present(*pmd))
>                 goto unlock;

But I didn't get the idea why pmd_none() was removed?

-- 
Regards,
--Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/