[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4d067a99-1112-3b3d-bedf-35c1124904fd@redhat.com>
Date: Wed, 31 Aug 2022 18:31:23 +0200
From: David Hildenbrand <david@...hat.com>
To: Peter Xu <peterx@...hat.com>
Cc: John Hubbard <jhubbard@...dia.com>,
Jason Gunthorpe <jgg@...dia.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mgorman@...e.de>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Hugh Dickins <hughd@...gle.com>,
Alistair Popple <apopple@...dia.com>
Subject: Re: [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in
GUP-fast
[...]
>> + /*
>> + * We have to make sure that while we clear PageAnonExclusive, that
>> + * the page is not pinned and that concurrent GUP-fast won't succeed in
>> + * concurrently pinning the page.
>> + *
>> + * Conceptually, GUP-fast pinning code of anon pages consists of:
>> + * (1) Read the PTE
>> + * (2) Pin the mapped page
>> + * (3) Check if the PTE changed by re-reading it; back off if so.
>> + * (4) Check if PageAnonExclusive is not set; back off if so.
>> + *
>> + * Conceptually, PageAnonExclusive clearing code consists of:
>> + * (1) Clear PTE
>> + * (2) Check if the page is pinned; back off if so.
>> + * (3) Clear PageAnonExclusive
>> + * (4) Restore PTE (optional)
>> + *
>> + * In GUP-fast, we have to make sure that (2),(3) and (4) happen in
>> + * the right order. Memory order between (2) and (3) is handled by
>> + * GUP-fast, independent of PageAnonExclusive.
>> + *
>> + * When clearing PageAnonExclusive(), we have to make sure that (1),
>> + * (2), (3) and (4) happen in the right order.
>> + *
>> + * Note that (4) has to happen after (3) in both cases to handle the
>> + * corner case whereby the PTE is restored to the original value after
>> + * clearing PageAnonExclusive and while GUP-fast might not detect the
>> + * PTE change, it will detect the PageAnonExclusive change.
>> + *
>> + * We assume that there might not be a memory barrier after
>> + * clearing/invalidating the PTE (1) and before restoring the PTE (4),
>> + * so we use explicit ones here.
>> + *
>> + * These memory barriers are paired with memory barriers in GUP-fast
>> + * code, including gup_must_unshare().
>> + */
>> +
>> + /* Clear/invalidate the PTE before checking for PINs. */
>> + if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
>> + smp_mb();
>
> Wondering whether this could be smp_mb__before_atomic().
We'll read via atomic_read().
That's a non-RMW operation. smp_mb__before_atomic() only applies to
RMW (Read Modify Write) operations.
I have an updated patch with improved description/comments, that includes the
following explanation/example and showcases how the two barrier pairs
interact:
Thread 0 (KSM) Thread 1 (GUP-fast)
(B1) Read the PTE
# (B2) skipped without FOLL_WRITE
(A1) Clear PTE
smb_mb()
(A2) Check pinned
(B3) Pin the mapped page
smb_mb()
(A3) Clear PageAnonExclusive
smb_wmb()
(A4) Restore PTE
(B4) Check if the PTE changed
smb_rmb()
(B5) Check PageAnonExclusive
>
>> +
>> + if (unlikely(page_maybe_dma_pinned(page)))
>> + return -EBUSY;
>> ClearPageAnonExclusive(page);
>> +
>> + /* Clear PageAnonExclusive() before eventually restoring the PTE. */
>> + if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
>> + smp_mb__after_atomic();
>> return 0;
>> }
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index e9414ee57c5b..2aef8d76fcf2 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2140,6 +2140,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>> *
>> * In case we cannot clear PageAnonExclusive(), split the PMD
>> * only and let try_to_migrate_one() fail later.
>> + *
>> + * See page_try_share_anon_rmap(): invalidate PMD first.
>> */
>> anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
>> if (freeze && anon_exclusive && page_try_share_anon_rmap(page))
>> @@ -3177,6 +3179,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>> flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
>> pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
>>
>> + /* See page_try_share_anon_rmap(): invalidate PMD first. */
>> anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
>> if (anon_exclusive && page_try_share_anon_rmap(page)) {
>> set_pmd_at(mm, address, pvmw->pmd, pmdval);
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index d7526c705081..971cf923c0eb 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -1091,6 +1091,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
>> goto out_unlock;
>> }
>>
>> + /* See page_try_share_anon_rmap(): clear PTE first. */
>> if (anon_exclusive && page_try_share_anon_rmap(page)) {
>> set_pte_at(mm, pvmw.address, pvmw.pte, entry);
>> goto out_unlock;
>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>> index 27fb37d65476..47e955212f15 100644
>> --- a/mm/migrate_device.c
>> +++ b/mm/migrate_device.c
>> @@ -193,20 +193,16 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>> bool anon_exclusive;
>> pte_t swp_pte;
>>
>
> flush_cache_page() missing here?
Hmm, wouldn't that already be missing on the !anon path right now?
>
> Better copy Alistair too when post formally since this will have a slight
> conflict with the other thread.
Yes, I'll give him a heads-up right away: full patch in
https://lkml.kernel.org/r/68b38ac4-c680-b694-21a9-1971396d63b9@redhat.com
Thanks for having a look Peter1
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists