lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4d067a99-1112-3b3d-bedf-35c1124904fd@redhat.com>
Date:   Wed, 31 Aug 2022 18:31:23 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     John Hubbard <jhubbard@...dia.com>,
        Jason Gunthorpe <jgg@...dia.com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Hugh Dickins <hughd@...gle.com>,
        Alistair Popple <apopple@...dia.com>
Subject: Re: [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in
 GUP-fast

[...]

>> +	/*
>> +	 * We have to make sure that while we clear PageAnonExclusive, that
>> +	 * the page is not pinned and that concurrent GUP-fast won't succeed in
>> +	 * concurrently pinning the page.
>> +	 *
>> +	 * Conceptually, GUP-fast pinning code of anon pages consists of:
>> +	 * (1) Read the PTE
>> +	 * (2) Pin the mapped page
>> +	 * (3) Check if the PTE changed by re-reading it; back off if so.
>> +	 * (4) Check if PageAnonExclusive is not set; back off if so.
>> +	 *
>> +	 * Conceptually, PageAnonExclusive clearing code consists of:
>> +	 * (1) Clear PTE
>> +	 * (2) Check if the page is pinned; back off if so.
>> +	 * (3) Clear PageAnonExclusive
>> +	 * (4) Restore PTE (optional)
>> +	 *
>> +	 * In GUP-fast, we have to make sure that (2),(3) and (4) happen in
>> +	 * the right order. Memory order between (2) and (3) is handled by
>> +	 * GUP-fast, independent of PageAnonExclusive.
>> +	 *
>> +	 * When clearing PageAnonExclusive(), we have to make sure that (1),
>> +	 * (2), (3) and (4) happen in the right order.
>> +	 *
>> +	 * Note that (4) has to happen after (3) in both cases to handle the
>> +	 * corner case whereby the PTE is restored to the original value after
>> +	 * clearing PageAnonExclusive and while GUP-fast might not detect the
>> +	 * PTE change, it will detect the PageAnonExclusive change.
>> +	 *
>> +	 * We assume that there might not be a memory barrier after
>> +	 * clearing/invalidating the PTE (1) and before restoring the PTE (4),
>> +	 * so we use explicit ones here.
>> +	 *
>> +	 * These memory barriers are paired with memory barriers in GUP-fast
>> +	 * code, including gup_must_unshare().
>> +	 */
>> +
>> +	/* Clear/invalidate the PTE before checking for PINs. */
>> +	if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
>> +		smp_mb();
> 
> Wondering whether this could be smp_mb__before_atomic().

We'll read via atomic_read().

That's a non-RMW operation. smp_mb__before_atomic() only applies to
RMW (Read Modify Write) operations.



I have an updated patch with improved description/comments, that includes the
following explanation/example and showcases how the two barrier pairs
interact:


    
            Thread 0 (KSM)                  Thread 1 (GUP-fast)
    
                                            (B1) Read the PTE
                                            # (B2) skipped without FOLL_WRITE
            (A1) Clear PTE
            smb_mb()
            (A2) Check pinned
                                            (B3) Pin the mapped page
                                            smb_mb()
            (A3) Clear PageAnonExclusive
            smb_wmb()
            (A4) Restore PTE
                                            (B4) Check if the PTE changed
                                            smb_rmb()
                                            (B5) Check PageAnonExclusive


> 
>> +
>> +	if (unlikely(page_maybe_dma_pinned(page)))
>> +		return -EBUSY;
>>  	ClearPageAnonExclusive(page);
>> +
>> +	/* Clear PageAnonExclusive() before eventually restoring the PTE. */
>> +	if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
>> +		smp_mb__after_atomic();
>>  	return 0;
>>  }
>>  
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index e9414ee57c5b..2aef8d76fcf2 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2140,6 +2140,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>>  		 *
>>  		 * In case we cannot clear PageAnonExclusive(), split the PMD
>>  		 * only and let try_to_migrate_one() fail later.
>> +		 *
>> +		 * See page_try_share_anon_rmap(): invalidate PMD first.
>>  		 */
>>  		anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
>>  		if (freeze && anon_exclusive && page_try_share_anon_rmap(page))
>> @@ -3177,6 +3179,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>>  	flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
>>  	pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
>>  
>> +	/* See page_try_share_anon_rmap(): invalidate PMD first. */
>>  	anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
>>  	if (anon_exclusive && page_try_share_anon_rmap(page)) {
>>  		set_pmd_at(mm, address, pvmw->pmd, pmdval);
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index d7526c705081..971cf923c0eb 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -1091,6 +1091,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
>>  			goto out_unlock;
>>  		}
>>  
>> +		/* See page_try_share_anon_rmap(): clear PTE first. */
>>  		if (anon_exclusive && page_try_share_anon_rmap(page)) {
>>  			set_pte_at(mm, pvmw.address, pvmw.pte, entry);
>>  			goto out_unlock;
>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>> index 27fb37d65476..47e955212f15 100644
>> --- a/mm/migrate_device.c
>> +++ b/mm/migrate_device.c
>> @@ -193,20 +193,16 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>  			bool anon_exclusive;
>>  			pte_t swp_pte;
>>  
> 
> flush_cache_page() missing here?

Hmm, wouldn't that already be missing on the !anon path right now?

> 
> Better copy Alistair too when post formally since this will have a slight
> conflict with the other thread.

Yes, I'll give him a heads-up right away: full patch in
https://lkml.kernel.org/r/68b38ac4-c680-b694-21a9-1971396d63b9@redhat.com


Thanks for having a look Peter1

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ