linux-kernel - Re: [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE size hugetlb page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f5f08e69-83d0-78af-39fb-f2180d13086d@linux.alibaba.com>
Date:   Wed, 24 Aug 2022 22:30:23 +0800
From:   Baolin Wang <baolin.wang@...ux.alibaba.com>
To:     David Hildenbrand <david@...hat.com>,
        Mike Kravetz <mike.kravetz@...cle.com>
Cc:     akpm@...ux-foundation.org, songmuchun@...edance.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE
 size hugetlb page



On 8/24/2022 7:55 PM, David Hildenbrand wrote:
> On 24.08.22 11:41, Baolin Wang wrote:
>>
>>
>> On 8/24/2022 3:31 PM, David Hildenbrand wrote:
>>>>>>>
>>>>>>> IMHO, these follow_huge_xxx() functions are arch-specified at first and
>>>>>>> were moved into the common hugetlb.c by commit 9e5fc74c3025 ("mm:
>>>>>>> hugetlb: Copy general hugetlb code from x86 to mm"), and now there are
>>>>>>> still some arch-specified follow_huge_xxx() definition, for example:
>>>>>>> ia64: follow_huge_addr
>>>>>>> powerpc: follow_huge_pd
>>>>>>> s390: follow_huge_pud
>>>>>>>
>>>>>>> What I mean is that follow_hugetlb_page() is a common and
>>>>>>> not-arch-specified function, is it suitable to change it to be
>>>>>>> arch-specified?
>>>>>>> And thinking more, can we rename follow_hugetlb_page() as
>>>>>>> hugetlb_page_faultin() and simplify it to only handle the page faults of
>>>>>>> hugetlb like the faultin_page() for normal page? That means we can make
>>>>>>> sure only follow_page_mask() can handle hugetlb.
>>>>>>>
>>>>>
>>>>> Something like that might work, but you still have two page table walkers
>>>>> for hugetlb.  I like David's idea (if I understand it correctly) of
>>>>
>>>> What I mean is we may change the hugetlb handling like normal page:
>>>> 1) use follow_page_mask() to look up a hugetlb firstly.
>>>> 2) if can not get the hugetlb, then try to page fault by
>>>> hugetlb_page_faultin().
>>>> 3) if page fault successed, then retry to find hugetlb by
>>>> follow_page_mask().
>>>
>>> That implies putting more hugetlbfs special code into generic GUP,
>>> turning it even more complicated. But of course, it depends on how the
>>> end result looks like. My gut feeling was that hugetlb is better handled
>>> in follow_hugetlb_page() separately (just like we do with a lot of other
>>> page table walkers).
>>
>> OK, fair enough.
>>
>>>>
>>>> Just a rough thought, and I need more investigation for my idea and
>>>> David's idea.
>>>>
>>>>> using follow_hugetlb_page for both cases.  As noted, it will need to be
>>>>> taught how to not trigger faults in the follow_page_mask case.
>>>>
>>>> Anyway, I also agree we need some cleanup, and firstly I think we should
>>>> cleanup these arch-specified follow_huge_xxx() on some architectures
>>>> which are similar with the common ones. I will look into these.
>>>
>>> There was a recent discussion on that, e.g.:
>>>
>>> https://lkml.kernel.org/r/20220818135717.609eef8a@thinkpad
>>
>> Thanks.
>>
>>>
>>>>
>>>> However, considering cleanup may need more investigation and
>>>> refactoring, now I prefer to make these bug-fix patches of this patchset
>>>> into mainline firstly, which are suitable to backport to old version to
>>>> fix potential race issues. Mike and David, how do you think? Could you
>>>> help to review these patches? Thanks.
>>>
>>> Patch #1 certainly add more special code just to handle another hugetlb
>>> corner case (CONT pages), and maybe just making it all use
>>> follow_hugetlb_page() would be even cleaner and less error prone.
>>>
>>> I agree that locking is shaky, but I'm not sure if we really want to
>>> backport this to stable trees:
>>>
>>> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>>>
>>> "It must fix a real bug that bothers people (not a, “This could be a
>>> problem...” type thing)."
>>>
>>>
>>> Do we actually have any instance of this being a real (and not a
>>> theoretical) problem? If not, I'd rather clean it all up right away.
>>
>> I think this is a real problem (not theoretical), and easy to write some
>> code to show the issue. For example, suppose thread A is trying to look
>> up a CONT-PTE size hugetlb page under the lock, however antoher thread B
>> can migrate the CONT-PTE hugetlb page at the same time, which will cause
>> thread A to get an incorrect page, if thread A want to do something for
>> this incorrect page, error occurs.
>>
>> Actually we also want to backport these fixes to the distro with old
>> kernel versions to make the hugetlb more stable. Otherwise we must hit
>> these issues sooner or later if the customers use CONT-PTE/PMD hugetlb.
>>
>> Anyway, if you and Mike still think these issues are not important
>> enough to be fixed in the old versions, I can do the cleanup firstly.
>>
> 
> [asking myself which follow_page() users actually care about hugetlb,
> and why we need this handling in follow_page at all]
> 
> Which follow_page() user do we care about here? Primarily mm/migrate.c
> only I assume?

Right, mainly affects the move_pages() syscall I think. Yes, I can not 
know all of the users of the move_pages() syscall now or in the future 
in our data center, but like I said the move_pages() syscall + hugetlb 
can be a real potential stability issue.