lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9a4fe603-950e-785b-6281-2e309256463f@nvidia.com>
Date:   Tue, 30 Aug 2022 12:18:41 -0700
From:   John Hubbard <jhubbard@...dia.com>
To:     David Hildenbrand <david@...hat.com>,
        Jason Gunthorpe <jgg@...dia.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Hugh Dickins <hughd@...gle.com>, Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in
 GUP-fast

On 8/30/22 11:53, David Hildenbrand wrote:
> Good, I managed to attract the attention of someone who understands that machinery :)
> 
> While validating whether GUP-fast and PageAnonExclusive code work correctly,
> I started looking at the whole RCU GUP-fast machinery. I do have a patch to
> improve PageAnonExclusive clearing (I think we're missing memory barriers to
> make it work as expected in any possible case), but I also stumbled eventually
> over a more generic issue that might need memory barriers.
> 
> Any thoughts whether I am missing something or this is actually missing
> memory barriers?
> 

It's actually missing memory barriers.

In fact, others have had that same thought! [1] :) In that 2019 thread,
I recall that this got dismissed because of a focus on the IPI-based
aspect of gup fast synchronization (there was some hand waving, perhaps
accurate waving, about memory barriers vs. CPU interrupts). But now the
RCU (non-IPI) implementation is more widely used than it used to be, the
issue is clearer.

> 
> From ce8c941c11d1f60cea87a3e4d941041dc6b79900 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@...hat.com>
> Date: Mon, 29 Aug 2022 16:57:07 +0200
> Subject: [PATCH] mm/gup: update refcount+pincount before testing if the PTE
>  changed
> 
> mm/ksm.c:write_protect_page() has to make sure that no unknown
> references to a mapped page exist and that no additional ones with write
> permissions are possible -- unknown references could have write permissions
> and modify the page afterwards.
> 
> Conceptually, mm/ksm.c:write_protect_page() consists of:
>   (1) Clear/invalidate PTE
>   (2) Check if there are unknown references; back off if so.
>   (3) Update PTE (e.g., map it R/O)
> 
> Conceptually, GUP-fast code consists of:
>   (1) Read the PTE
>   (2) Increment refcount/pincount of the mapped page
>   (3) Check if the PTE changed by re-reading it; back off if so.
> 
> To make sure GUP-fast won't be able to grab additional references after
> clearing the PTE, but will properly detect the change and back off, we
> need a memory barrier between updating the recount/pincount and checking
> if it changed.
> 
> try_grab_folio() doesn't necessarily imply a memory barrier, so add an
> explicit smp_mb__after_atomic() after the atomic RMW operation to
> increment the refcount and pincount.
> 
> ptep_clear_flush() used to clear the PTE and flush the TLB should imply
> a memory barrier for flushing the TLB, so don't add another one for now.
> 
> PageAnonExclusive handling requires further care and will be handled
> separately.
> 
> Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()")
> Signed-off-by: David Hildenbrand <david@...hat.com>
> ---
>  mm/gup.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 5abdaf487460..0008b808f484 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2392,6 +2392,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>  			goto pte_unmap;
>  		}
>  
> +		/*
> +		 * Update refcount/pincount before testing for changed PTE. This
> +		 * is required for code like mm/ksm.c:write_protect_page() that
> +		 * wants to make sure that a page has no unknown references
> +		 * after clearing the PTE.
> +		 */
> +		smp_mb__after_atomic();
> +
>  		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
>  			gup_put_folio(folio, 1, flags);
>  			goto pte_unmap;
> @@ -2577,6 +2585,9 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
>  	if (!folio)
>  		return 0;
>  
> +	/* See gup_pte_range(). */

Don't we usually also identify what each mb pairs with, in the comments? That would help.

> +	smp_mb__after_atomic();
> +
>  	if (unlikely(pte_val(pte) != pte_val(*ptep))) {
>  		gup_put_folio(folio, refs, flags);
>  		return 0;
> @@ -2643,6 +2654,9 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>  	if (!folio)
>  		return 0;
>  
> +	/* See gup_pte_range(). */
> +	smp_mb__after_atomic();
> +
>  	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
>  		gup_put_folio(folio, refs, flags);
>  		return 0;
> @@ -2683,6 +2697,9 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
>  	if (!folio)
>  		return 0;
>  
> +	/* See gup_pte_range(). */
> +	smp_mb__after_atomic();
> +
>  	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
>  		gup_put_folio(folio, refs, flags);
>  		return 0;


[1] https://lore.kernel.org/lkml/9465df76-0229-1b44-5646-5cced1bc1718@nvidia.com/


thanks,

-- 
John Hubbard
NVIDIA

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ