[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <BF2C0154-4DC1-4B29-A7D1-F2192AFA9B4E@fb.com>
Date: Thu, 13 Jun 2019 17:42:39 +0000
From: Song Liu <songliubraving@...com>
To: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
CC: "Kirill A. Shutemov" <kirill@...temov.name>,
LKML <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"namit@...are.com" <namit@...are.com>,
"peterz@...radead.org" <peterz@...radead.org>,
"oleg@...hat.com" <oleg@...hat.com>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"mhiramat@...nel.org" <mhiramat@...nel.org>,
"matthew.wilcox@...cle.com" <matthew.wilcox@...cle.com>,
Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH v3 3/5] mm, thp: introduce FOLL_SPLIT_PMD
> On Jun 13, 2019, at 9:47 AM, Song Liu <songliubraving@...com> wrote:
>
> Hi Kirill,
>
>> On Jun 13, 2019, at 8:24 AM, Song Liu <songliubraving@...com> wrote:
>>
>>
>>
>>> On Jun 13, 2019, at 8:14 AM, Kirill A. Shutemov <kirill.shutemov@...ux.intel.com> wrote:
>>>
>>> On Thu, Jun 13, 2019 at 03:03:01PM +0000, Song Liu wrote:
>>>>
>>>>
>>>>> On Jun 13, 2019, at 7:16 AM, Kirill A. Shutemov <kirill@...temov.name> wrote:
>>>>>
>>>>> On Thu, Jun 13, 2019 at 01:57:30PM +0000, Song Liu wrote:
>>>>>>> And I'm not convinced that it belongs here at all. User requested PMD
>>>>>>> split and it is done after split_huge_pmd(). The rest can be handled by
>>>>>>> the caller as needed.
>>>>>>
>>>>>> I put this part here because split_huge_pmd() for file-backed THP is
>>>>>> not really done after split_huge_pmd(). And I would like it done before
>>>>>> calling follow_page_pte() below. Maybe we can still do them here, just
>>>>>> for file-backed THPs?
>>>>>>
>>>>>> If we would move it, shall we move to callers of follow_page_mask()?
>>>>>> In that case, we will probably end up with similar code in two places:
>>>>>> __get_user_pages() and follow_page().
>>>>>>
>>>>>> Did I get this right?
>>>>>
>>>>> Would it be enough to replace pte_offset_map_lock() in follow_page_pte()
>>>>> with pte_alloc_map_lock()?
>>>>
>>>> This is similar to my previous version:
>>>>
>>>> + } else { /* flags & FOLL_SPLIT_PMD */
>>>> + pte_t *pte;
>>>> + spin_unlock(ptl);
>>>> + split_huge_pmd(vma, pmd, address);
>>>> + pte = get_locked_pte(mm, address, &ptl);
>>>> + if (!pte)
>>>> + return no_page_table(vma, flags);
>>>> + spin_unlock(ptl);
>>>> + ret = 0;
>>>> + }
>>>>
>>>> I think this is cleaner than use pte_alloc_map_lock() in follow_page_pte().
>>>> What's your thought on these two versions (^^^ vs. pte_alloc_map_lock)?
>>>
>>> It's additional lock-unlock cycle and few more lines of code...
>>>
>>>>> This will leave bunch not populated PTE entries, but it is fine: they will
>>>>> be populated on the next access to them.
>>>>
>>>> We need to handle page fault during next access, right? Since we already
>>>> allocated everything, we can just populate the PTE entries and saves a
>>>> lot of page faults (assuming we will access them later).
>>>
>>> Not a lot due to faultaround and they may never happen, but you need to
>>> tear down the mapping any way.
>>
>> I see. Let me try this way.
>>
>> Thanks,
>> Song
>
> To make sure I understand your suggestions. Here is what I got:
>
> diff --git c/mm/gup.c w/mm/gup.c
> index ddde097cf9e4..85e6f46fd925 100644
> --- c/mm/gup.c
> +++ w/mm/gup.c
> @@ -197,7 +197,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
> if (unlikely(pmd_bad(*pmd)))
> return no_page_table(vma, flags);
>
> - ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
> + ptep = pte_alloc_map_lock(mm, pmd, address, &ptl);
> + if (!ptep)
> + return ERR_PTR(-ENOMEM);
> +
> pte = *ptep;
> if (!pte_present(pte)) {
> swp_entry_t entry;
> @@ -398,7 +401,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
> spin_unlock(ptl);
> return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
> }
> - if (flags & FOLL_SPLIT) {
> + if (flags & (FOLL_SPLIT | FOLL_SPLIT_PMD)) {
> int ret;
> page = pmd_page(*pmd);
> if (is_huge_zero_page(page)) {
> @@ -407,7 +410,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
> split_huge_pmd(vma, pmd, address);
> if (pmd_trans_unstable(pmd))
> ret = -EBUSY;
> - } else {
> + } else if (flags & FOLL_SPLIT) {
> if (unlikely(!try_get_page(page))) {
> spin_unlock(ptl);
> return ERR_PTR(-ENOMEM);
> @@ -419,6 +422,10 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
> put_page(page);
> if (pmd_none(*pmd))
> return no_page_table(vma, flags);
> + } else { /* flags & FOLL_SPLIT_PMD */
> + spin_unlock(ptl);
> + split_huge_pmd(vma, pmd, address);
> + ret = 0;
> }
>
> return ret ? ERR_PTR(ret) :
> follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
>
>
> This version doesn't work as-is, because it returns at the first check:
>
> if (unlikely(pmd_bad(*pmd)))
> return no_page_table(vma, flags);
>
> Did I misunderstand anything here?
>
> Thanks,
> Song
I guess this would be the best. It _is_ a lot simpler.
diff --git c/mm/gup.c w/mm/gup.c
index ddde097cf9e4..0cd3ce599f41 100644
--- c/mm/gup.c
+++ w/mm/gup.c
@@ -398,7 +398,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
spin_unlock(ptl);
return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
}
- if (flags & FOLL_SPLIT) {
+ if (flags & (FOLL_SPLIT | FOLL_SPLIT_PMD)) {
int ret;
page = pmd_page(*pmd);
if (is_huge_zero_page(page)) {
@@ -407,7 +407,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
split_huge_pmd(vma, pmd, address);
if (pmd_trans_unstable(pmd))
ret = -EBUSY;
- } else {
+ } else if (flags & FOLL_SPLIT) {
if (unlikely(!try_get_page(page))) {
spin_unlock(ptl);
return ERR_PTR(-ENOMEM);
@@ -419,6 +419,11 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
put_page(page);
if (pmd_none(*pmd))
return no_page_table(vma, flags);
+ } else { /* flags & FOLL_SPLIT_PMD */
+ spin_unlock(ptl);
+ ret = 0;
+ split_huge_pmd(vma, pmd, address);
+ pte_alloc(mm, pmd);
}
Thanks again for the suggestions. I will send v4 soon.
Song
>
>
>>
>>>
>>> --
>>> Kirill A. Shutemov
Powered by blists - more mailing lists