lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 24 Jul 2013 11:44:28 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Michal Hocko <mhocko@...e.cz>
Cc:	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	linux-mm@...ck.org, Rik van Riel <riel@...hat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Hillf Danton <dhillf@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: hugepage related lockdep trace.

On Tue, Jul 23, 2013 at 04:01:20PM +0200, Michal Hocko wrote:
> On Fri 19-07-13 09:13:03, Minchan Kim wrote:
> > On Thu, Jul 18, 2013 at 11:12:24PM +0530, Aneesh Kumar K.V wrote:
> [...]
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 83aff0a..2cb1be3 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -3266,8 +3266,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
> > >  		put_page(virt_to_page(spte));
> > >  	spin_unlock(&mm->page_table_lock);
> > >  out:
> > > -	pte = (pte_t *)pmd_alloc(mm, pud, addr);
> > >  	mutex_unlock(&mapping->i_mmap_mutex);
> > > +	pte = (pte_t *)pmd_alloc(mm, pud, addr);
> > >  	return pte;
> > 
> > I am blind on hugetlb but not sure it doesn't break eb48c071.
> > Michal?
> 
> Well, it is some time since I debugged the race and all the details
> vanished in the meantime. But this part of the changelog suggests that
> this indeed breaks the fix:
> "
>     This patch addresses the issue by moving pmd_alloc into huge_pmd_share
>     which guarantees that the shared pud is populated in the same critical
>     section as pmd.  This also means that huge_pte_offset test in
>     huge_pmd_share is serialized correctly now which in turn means that the
>     success of the sharing will be higher as the racing tasks see the pud
>     and pmd populated together.
> "
> 
> Besides that I fail to see how moving pmd_alloc down changes anything.
> Even if pmd_alloc triggered reclaim then we cannot trip over the same
> i_mmap_mutex as hugetlb pages are not reclaimable because they are not
> on the LRU.

I thought we could map some part of binary with normal page and other part
of the one with MAP_HUGETLB so that a address space could have both normal
page and HugeTLB page. Okay, it's impossible so HugeTLB pages are not on LRU.
Then, above lockdep warning is totally false positive.
Best solution is avoiding pmd_alloc with holding i_mmap_mutex but we need it
to fix eb48c071 so how about this if we couldn't have a better idea?


diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 83aff0a..e7c3a15 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3240,7 +3240,15 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 	if (!vma_shareable(vma, addr))
 		return (pte_t *)pmd_alloc(mm, pud, addr);
 
-	mutex_lock(&mapping->i_mmap_mutex);
+	/*
+	 * It annotates to shut lockdep's warning up casued by i_mmap_mutex
+	 * Below pmd_alloc try to allocate memory with GFP_KERNEL while
+	 * holding i_mmap_mutex so that it could enter direct reclaim path
+	 * that rmap try to hold i_mmap_mutex again. But it's no problem
+	 * for hugetlb because pages on hugetlb never could live in LRU so
+	 * it's false positive. I hope someone fixes it with avoiding pmd_alloc
+        * with holding i_mmap_mutex rather than nesting annotation.
+	 */
+	mutex_lock_nested(&mapping->i_mmap_mutex, SINGLE_DEPTH_NESTING);
 	vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
 		if (svma == vma)
 			continue;

> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ