[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <887ca2e2-a7c5-93a7-46cb-185daccd4444@redhat.com>
Date: Thu, 25 Aug 2022 09:25:31 +0200
From: David Hildenbrand <david@...hat.com>
To: Mike Kravetz <mike.kravetz@...cle.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>
Cc: akpm@...ux-foundation.org, songmuchun@...edance.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE
size hugetlb page
> Is the primary concern the locking? If so, I am not sure we have an issue.
> As mentioned in your commit message, current code will use
> pte_offset_map_lock(). pte_offset_map_lock uses pte_lockptr, and pte_lockptr
> will either be the mm wide lock or pmd_page lock. To me, it seems that
> either would provide correct synchronization for CONT-PTE entries. Am I
> missing something or misreading the code?
>
> I started looking at code cleanup suggested by David. Here is a quick
> patch (not tested and likely containing errors) to see if this is a step
> in the right direction.
>
> I like it because we get rid of/combine all those follow_huge_p*d
> routines.
>
Yes, see comments below.
> From 35d117a707c1567ddf350554298697d40eace0d7 Mon Sep 17 00:00:00 2001
> From: Mike Kravetz <mike.kravetz@...cle.com>
> Date: Wed, 24 Aug 2022 15:59:15 -0700
> Subject: [PATCH] hugetlb: call hugetlb_follow_page_mask for hugetlb pages in
> follow_page_mask
>
> At the beginning of follow_page_mask, there currently is a call to
> follow_huge_addr which 'may' handle hugetlb pages. ia64 is the only
> architecture which (incorrectly) provides a follow_huge_addr routine
> that does not return error. Instead, at each level of the page table a
> check is made for a hugetlb entry. If a hugetlb entry is found, a call
> to a routine associated with that page table level such as
> follow_huge_pmd is made.
>
> All the follow_huge_p*d routines are basically the same. In addition
> huge page size can be derived from the vma, so we know where in the page
> table a huge page would reside. So, replace follow_huge_addr with a
> new architecture independent routine which will provide the same
> functionality as the follow_huge_p*d routines. We can then eliminate
> the p*d_huge checks in follow_page_mask page table walking as well as
> the follow_huge_p*d routines themselves.>
> follow_page_mask still has is_hugepd hugetlb checks during page table
> walking. This is due to these checks and follow_huge_pd being
> architecture specific. These can be eliminated if
> hugetlb_follow_page_mask can be overwritten by architectures (powerpc)
> that need to do follow_huge_pd processing.
But won't the
> + /* hugetlb is special */
> + if (is_vm_hugetlb_page(vma))
> + return hugetlb_follow_page_mask(vma, address, flags);
code route everything via hugetlb_follow_page_mask() and all these
(beloved) hugepd checks would essentially be unreachable?
At least my understanding is that hugepd only applies to hugetlb.
Can't we move the hugepd handling code into hugetlb_follow_page_mask()
as well?
I mean, doesn't follow_hugetlb_page() also have to handle that hugepd
stuff already ... ?
[...]
>
> +struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> + unsigned long address, unsigned int flags)
> +{
> + struct hstate *h = hstate_vma(vma);
> + struct mm_struct *mm = vma->vm_mm;
> + unsigned long haddr = address & huge_page_mask(h);
> + struct page *page = NULL;
> + spinlock_t *ptl;
> + pte_t *pte, entry;
> +
> + /*
> + * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
> + * follow_hugetlb_page().
> + */
> + if (WARN_ON_ONCE(flags & FOLL_PIN))
> + return NULL;
> +
> + pte = huge_pte_offset(mm, haddr, huge_page_size(h));
> + if (!pte)
> + return NULL;
> +
> +retry:
> + ptl = huge_pte_lock(h, mm, pte);
> + entry = huge_ptep_get(pte);
> + if (pte_present(entry)) {
> + page = pte_page(entry);
> + /*
> + * try_grab_page() should always succeed here, because we hold
> + * the ptl lock and have verified pte_present().
> + */
> + if (WARN_ON_ONCE(!try_grab_page(page, flags))) {
> + page = NULL;
> + goto out;
> + }
> + } else {
> + if (is_hugetlb_entry_migration(entry)) {
> + spin_unlock(ptl);
> + __migration_entry_wait_huge(pte, ptl);
> + goto retry;
> + }
> + /*
> + * hwpoisoned entry is treated as no_page_table in
> + * follow_page_mask().
> + */
> + }
> +out:
> + spin_unlock(ptl);
> + return page;
This is neat and clean enough to not reuse follow_hugetlb_page(). I
wonder if we want to add some comment to the function how this differs
to follow_hugetlb_page().
... or do we maybe want to rename follow_hugetlb_page() to someting like
__hugetlb_get_user_pages() to make it clearer in which context it will
get called?
I guess it might be feasible in the future to eliminate
follow_hugetlb_page() and centralizing the faulting code. For now, this
certainly improves the situation.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists