[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d5af7a4a-7626-d20b-e34a-45be8144d6e8@linux.alibaba.com>
Date: Wed, 24 Aug 2022 23:23:10 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: David Hildenbrand <david@...hat.com>,
Mike Kravetz <mike.kravetz@...cle.com>
Cc: akpm@...ux-foundation.org, songmuchun@...edance.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE
size hugetlb page
On 8/24/2022 11:13 PM, David Hildenbrand wrote:
> On 24.08.22 17:06, Baolin Wang wrote:
>>
>>
>> On 8/24/2022 10:33 PM, David Hildenbrand wrote:
>>> On 24.08.22 16:30, Baolin Wang wrote:
>>>>
>>>>
>>>> On 8/24/2022 7:55 PM, David Hildenbrand wrote:
>>>>> On 24.08.22 11:41, Baolin Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 8/24/2022 3:31 PM, David Hildenbrand wrote:
>>>>>>>>>>>
>>>>>>>>>>> IMHO, these follow_huge_xxx() functions are arch-specified at first and
>>>>>>>>>>> were moved into the common hugetlb.c by commit 9e5fc74c3025 ("mm:
>>>>>>>>>>> hugetlb: Copy general hugetlb code from x86 to mm"), and now there are
>>>>>>>>>>> still some arch-specified follow_huge_xxx() definition, for example:
>>>>>>>>>>> ia64: follow_huge_addr
>>>>>>>>>>> powerpc: follow_huge_pd
>>>>>>>>>>> s390: follow_huge_pud
>>>>>>>>>>>
>>>>>>>>>>> What I mean is that follow_hugetlb_page() is a common and
>>>>>>>>>>> not-arch-specified function, is it suitable to change it to be
>>>>>>>>>>> arch-specified?
>>>>>>>>>>> And thinking more, can we rename follow_hugetlb_page() as
>>>>>>>>>>> hugetlb_page_faultin() and simplify it to only handle the page faults of
>>>>>>>>>>> hugetlb like the faultin_page() for normal page? That means we can make
>>>>>>>>>>> sure only follow_page_mask() can handle hugetlb.
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Something like that might work, but you still have two page table walkers
>>>>>>>>> for hugetlb. I like David's idea (if I understand it correctly) of
>>>>>>>>
>>>>>>>> What I mean is we may change the hugetlb handling like normal page:
>>>>>>>> 1) use follow_page_mask() to look up a hugetlb firstly.
>>>>>>>> 2) if can not get the hugetlb, then try to page fault by
>>>>>>>> hugetlb_page_faultin().
>>>>>>>> 3) if page fault successed, then retry to find hugetlb by
>>>>>>>> follow_page_mask().
>>>>>>>
>>>>>>> That implies putting more hugetlbfs special code into generic GUP,
>>>>>>> turning it even more complicated. But of course, it depends on how the
>>>>>>> end result looks like. My gut feeling was that hugetlb is better handled
>>>>>>> in follow_hugetlb_page() separately (just like we do with a lot of other
>>>>>>> page table walkers).
>>>>>>
>>>>>> OK, fair enough.
>>>>>>
>>>>>>>>
>>>>>>>> Just a rough thought, and I need more investigation for my idea and
>>>>>>>> David's idea.
>>>>>>>>
>>>>>>>>> using follow_hugetlb_page for both cases. As noted, it will need to be
>>>>>>>>> taught how to not trigger faults in the follow_page_mask case.
>>>>>>>>
>>>>>>>> Anyway, I also agree we need some cleanup, and firstly I think we should
>>>>>>>> cleanup these arch-specified follow_huge_xxx() on some architectures
>>>>>>>> which are similar with the common ones. I will look into these.
>>>>>>>
>>>>>>> There was a recent discussion on that, e.g.:
>>>>>>>
>>>>>>> https://lkml.kernel.org/r/20220818135717.609eef8a@thinkpad
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> However, considering cleanup may need more investigation and
>>>>>>>> refactoring, now I prefer to make these bug-fix patches of this patchset
>>>>>>>> into mainline firstly, which are suitable to backport to old version to
>>>>>>>> fix potential race issues. Mike and David, how do you think? Could you
>>>>>>>> help to review these patches? Thanks.
>>>>>>>
>>>>>>> Patch #1 certainly add more special code just to handle another hugetlb
>>>>>>> corner case (CONT pages), and maybe just making it all use
>>>>>>> follow_hugetlb_page() would be even cleaner and less error prone.
>>>>>>>
>>>>>>> I agree that locking is shaky, but I'm not sure if we really want to
>>>>>>> backport this to stable trees:
>>>>>>>
>>>>>>> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>>>>>>>
>>>>>>> "It must fix a real bug that bothers people (not a, “This could be a
>>>>>>> problem...” type thing)."
>>>>>>>
>>>>>>>
>>>>>>> Do we actually have any instance of this being a real (and not a
>>>>>>> theoretical) problem? If not, I'd rather clean it all up right away.
>>>>>>
>>>>>> I think this is a real problem (not theoretical), and easy to write some
>>>>>> code to show the issue. For example, suppose thread A is trying to look
>>>>>> up a CONT-PTE size hugetlb page under the lock, however antoher thread B
>>>>>> can migrate the CONT-PTE hugetlb page at the same time, which will cause
>>>>>> thread A to get an incorrect page, if thread A want to do something for
>>>>>> this incorrect page, error occurs.
>>>>>>
>>>>>> Actually we also want to backport these fixes to the distro with old
>>>>>> kernel versions to make the hugetlb more stable. Otherwise we must hit
>>>>>> these issues sooner or later if the customers use CONT-PTE/PMD hugetlb.
>>>>>>
>>>>>> Anyway, if you and Mike still think these issues are not important
>>>>>> enough to be fixed in the old versions, I can do the cleanup firstly.
>>>>>>
>>>>>
>>>>> [asking myself which follow_page() users actually care about hugetlb,
>>>>> and why we need this handling in follow_page at all]
>>>>>
>>>>> Which follow_page() user do we care about here? Primarily mm/migrate.c
>>>>> only I assume?
>>>>
>>>> Right, mainly affects the move_pages() syscall I think. Yes, I can not
>>>> know all of the users of the move_pages() syscall now or in the future
>>>> in our data center, but like I said the move_pages() syscall + hugetlb
>>>> can be a real potential stability issue.
>>>>
>>>
>>> I wonder if we can get rid of follow_page() completely, there are not
>>> too many users. Or alternatively simply make it use general GUP
>>> infrastructure more clearly. We'd need something like FOLL_NOFAULT that
>>> also covers "absolutely no faults".
>>
>> I am not sure I get your point. So you want change to use
>> __get_user_pages() (or silimar wrappers) to look up a normal page or
>> hugetlb instead of follow_page()? and adding a new FOLL_NOFAULT flag to
>> __get_user_pages().
>
> Essentially just getting rid of follow_page() completely or making it a
> wrapper of __get_user_pages().
OK.
>
>>
>> If I understand correctly, we still need more work to move those
>> arch-specified follow_huge_xxx() into follow_hugetlb_page() firstly like
>> we disscussed before? Which seems not backportable too.
>
> I'm not sure we need all that magic in these arch specific helpers after
> all. I haven't looked into the details, but I really wonder why they
> handle something that follow_hugetlb_page() cannot easily handle. It all
> smells like legacy cruft.
Agree, not sure if there are some historical legacy issue, need more
investigation.
>
>>
>> I am not againt your idea, and I also agree that we should do some
>> cleanup. But the point is if we need backport patches to fix this issue,
>> which affects move_pages() syscall, if the answer is yes, I think my
>> current fixes are suitable to backport.
>
> I really don't like adding more make-legacy-cruft-happy code unless
> there is *real* need for it. (you could always just fix old kernels you
> care about with your patches here -- do they have to be in mainline?
Not have to, but better...
> don't think so)
>
> But of course, it's up to Mike to decide, just my 2 cents :)
Thanks for your useful comments, and let's see what's the view of Mike :)
Powered by blists - more mailing lists