linux-kernel - Re: [PATCH] mm/hugetlb: Fix pgtable lock on pmd sharing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230612204418.GD3704@monkey>
Date:   Mon, 12 Jun 2023 13:44:18 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Muchun Song <songmuchun@...edance.com>,
        Naoya Horiguchi <naoya.horiguchi@....com>
Subject: Re: [PATCH] mm/hugetlb: Fix pgtable lock on pmd sharing

On 06/12/23 12:04, Peter Xu wrote:
> Huge pmd sharing operates on PUD not PMD, huge_pte_lock() is not suitable
> in this case because it should only work for last level pte changes, while
> pmd sharing is always one level higher.

Right!  That lock does not prevent someone else from concurrently modifying
the PUD.

> Meanwhile, here we're locking over the spte pgtable lock which is even not
> a lock for current mm but someone else's.
> 
> It seems even racy on operating on the lock, as after put_page() of the
> spte pgtable page logically the page can be released, so at least the
> spin_unlock() needs to be done after the put_page().

Agree.

> No report I am aware, I'm not even sure whether it'll just work on taking
> the spte pmd lock, because while we're holding i_mmap read lock it probably
> means the vma interval tree is frozen, all pte allocators over this pud
> entry could always find the specific svma and spte page, so maybe they'll
> serialize on this spte page lock?

It seems they would serialize IF they were trying to instantiate the same
shared page.  However, I suppose another thread could be trying to
instantiate something totally different in the VA range represented by that
PUD.  In this case, it seems like there would be no synchronization.

>                                    Even so, doesn't seem to be expected.
> It just seems to be an accident of cb900f412154.
> 
> Fix it with the proper pud lock (which is the mm's page_table_lock).
> 
> Cc: Mike Kravetz <mike.kravetz@...cle.com>
> Cc: Naoya Horiguchi <naoya.horiguchi@....com>
> Fixes: cb900f412154 ("mm, hugetlb: convert hugetlbfs to use split pmd lock")
> Signed-off-by: Peter Xu <peterx@...hat.com>
> ---
>  mm/hugetlb.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)

Agree with this change.  But, it does make one wonder if the pud_clear()
in huge_pmd_unshare is safe.  Like here, we really should be holding the
higher level lock but are holding the PMD lock.
-- 
Mike Kravetz

> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index dfa412d8cb30..270ec0ecd5a1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -7133,7 +7133,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
>  	unsigned long saddr;
>  	pte_t *spte = NULL;
>  	pte_t *pte;
> -	spinlock_t *ptl;
>  
>  	i_mmap_lock_read(mapping);
>  	vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
> @@ -7154,7 +7153,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
>  	if (!spte)
>  		goto out;
>  
> -	ptl = huge_pte_lock(hstate_vma(vma), mm, spte);
> +	spin_lock(&mm->page_table_lock);
>  	if (pud_none(*pud)) {
>  		pud_populate(mm, pud,
>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
> @@ -7162,7 +7161,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
>  	} else {
>  		put_page(virt_to_page(spte));
>  	}
> -	spin_unlock(ptl);
> +	spin_unlock(&mm->page_table_lock);
>  out:
>  	pte = (pte_t *)pmd_alloc(mm, pud, addr);
>  	i_mmap_unlock_read(mapping);
> -- 
> 2.40.1
>