linux-kernel - Re: [PATCH] mm: hugetlb: fix UAF in hugetlb_handle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <708d1ec7-5c25-5e45-0db8-bd97a64e0db1@redhat.com>
Date:   Thu, 22 Sep 2022 09:46:35 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>,
        Liu Shixin <liushixin2@...wei.com>
Cc:     Liu Zixian <liuzixian4@...wei.com>,
        Muchun Song <songmuchun@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        John Hubbard <jhubbard@...dia.com>,
        Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH] mm: hugetlb: fix UAF in hugetlb_handle_userfault

On 22.09.22 01:57, Mike Kravetz wrote:
> On 09/21/22 10:48, Mike Kravetz wrote:
>> On 09/21/22 16:34, Liu Shixin wrote:
>>> The vma_lock and hugetlb_fault_mutex are dropped before handling
>>> userfault and reacquire them again after handle_userfault(), but
>>> reacquire the vma_lock could lead to UAF[1] due to the following
>>> race,
>>>
>>> hugetlb_fault
>>>    hugetlb_no_page
>>>      /*unlock vma_lock */
>>>      hugetlb_handle_userfault
>>>        handle_userfault
>>>          /* unlock mm->mmap_lock*/
>>>                                             vm_mmap_pgoff
>>>                                               do_mmap
>>>                                                 mmap_region
>>>                                                   munmap_vma_range
>>>                                                     /* clean old vma */
>>>          /* lock vma_lock again  <--- UAF */
>>>      /* unlock vma_lock */
>>>
>>> Since the vma_lock will unlock immediately after hugetlb_handle_userfault(),
>>> let's drop the unneeded lock and unlock in hugetlb_handle_userfault() to fix
>>> the issue.
>>
>> Thank you very much!
>>
>> When I saw this report, the obvious fix was to do something like what you have
>> done below.  That looks fine with a few minor comments.
>>
>> One question I have not yet answered is, "Does this same issue apply to
>> follow_hugetlb_page()?".  I believe it does.  follow_hugetlb_page calls
>> hugetlb_fault which could result in the fault being processed by userfaultfd.
>> If we experience the race above, then the associated vma could no longer be
>> valid when returning from hugetlb_fault.  follow_hugetlb_page and callers
>> have a flag (locked) to deal with dropping mmap lock.  However, I am not sure
>> if it is handled correctly WRT userfaultfd.  I think this needs to be answered
>> before fixing.  And, if the follow_hugetlb_page code needs to be fixed it
>> should be done at the same time.
>>
> 
> To at least verify this code path, I added userfaultfd handling to the gup_test
> program in kernel selftests.  When doing basic gup test on a hugetlb page in
> a userfaultfd registered range, I hit this warning:
> 
> [ 6939.867796] FAULT_FLAG_ALLOW_RETRY missing 1
> [ 6939.871503] CPU: 2 PID: 5720 Comm: gup_test Not tainted 6.0.0-rc6-next-20220921+ #72
> [ 6939.874562] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
> [ 6939.877707] Call Trace:
> [ 6939.878745]  <TASK>
> [ 6939.879779]  dump_stack_lvl+0x6c/0x9f
> [ 6939.881199]  handle_userfault.cold+0x14/0x1e
> [ 6939.882830]  ? find_held_lock+0x2b/0x80
> [ 6939.884370]  ? __mutex_unlock_slowpath+0x45/0x280
> [ 6939.886145]  hugetlb_handle_userfault+0x90/0xf0
> [ 6939.887936]  hugetlb_fault+0xb7e/0xda0
> [ 6939.889409]  ? vprintk_emit+0x118/0x3a0
> [ 6939.890903]  ? _printk+0x58/0x73
> [ 6939.892279]  follow_hugetlb_page.cold+0x59/0x145
> [ 6939.894116]  __get_user_pages+0x146/0x750
> [ 6939.895580]  __gup_longterm_locked+0x3e9/0x680
> [ 6939.897023]  ? seqcount_lockdep_reader_access.constprop.0+0xa5/0xb0
> [ 6939.898939]  ? lockdep_hardirqs_on+0x7d/0x100
> [ 6939.901243]  gup_test_ioctl+0x320/0x6e0
> [ 6939.902202]  __x64_sys_ioctl+0x87/0xc0
> [ 6939.903220]  do_syscall_64+0x38/0x90
> [ 6939.904233]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 6939.905423] RIP: 0033:0x7fbb53830f7b
> 
> This is because userfaultfd is expecting FAULT_FLAG_ALLOW_RETRY which is not
> set in this path.

Right. Without being able to drop the mmap lock, we cannot continue. And 
we don't know if we can drop it without FAULT_FLAG_ALLOW_RETRY.

FAULT_FLAG_ALLOW_RETRY is only set when we can communicate to the caller 
that we dropped the mmap lock [e.g., int *locked parameter].

All code paths that pass NULL won't be able to handle --  especially 
surprisingly also pin_user_pages_fast() -- cannot trigger usefaultfd and 
will result in this warning.


A "sane" example is access via /proc/self/mem via ptrace: we don't want 
to trigger userfaultfd, but instead simply fail the GUP get/pin.


Now, this is just a printed *warning* (not a WARN/BUG/taint) that tells 
us that there is a GUP user that isn't prepared for userfaultfd. So it 
rather points out a missing GUP adaption -- incomplete userfaultfd 
support. And we seem to have plenty of that judging that 
pin_user_pages_fast_only().

Maybe the printed stack trace is a bit too much and makes this look very 
scary.

> 
> Adding John, Peter and David on Cc: as they are much more fluent in all the
> fault and FOLL combinations and might have immediate suggestions.  It is going
> to take me a little while to figure out:
> 1) How to make sure we get the right flags passed to handle_userfault

This is a GUP caller problem -- or rather, how GUP has to deal with 
userfaultfd.

> 2) How to modify follow_hugetlb_page as userfaultfd can certainly drop
>     mmap_lock.  So we can not assume vma still exists upon return.

Again, we have to communicate to the GUP caller that we dropped the mmap 
lock. And that requires GUP caller changes.

-- 
Thanks,

David / dhildenb