linux-kernel - Re: [PATCH v3] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <78359cf0-6e28-2aaa-d17e-6519b117b3db@oracle.com>
Date:   Mon, 24 May 2021 17:45:27 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Mina Almasry <almasrymina@...gle.com>
Cc:     Axel Rasmussen <axelrasmussen@...gle.com>,
        Peter Xu <peterx@...hat.com>, Linux-MM <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3] mm, hugetlb: fix resv_huge_pages underflow on
 UFFDIO_COPY

On 5/24/21 5:11 PM, Mina Almasry wrote:
>>> +                     if (!HPageRestoreReserve(page)) {
>>> +                             if (unlikely(hugetlb_unreserve_pages(
>>> +                                         mapping->host, idx, idx + 1, 1)))
>>> +                                     hugetlb_fix_reserve_counts(
>>> +                                             mapping->host);
>>> +                     }
>>
>> I do not understand the need to call hugetlb_unreserve_pages().  The
>> call to restore_reserve_on_error 'should' fix up the reserve map to
>> align with restoring the reserve count in put_page/free_huge_page.
>> Can you explain why that is there?
>>
> 
> AFAICT here is what happens for a given index *without* the call to
> hugetlb_unreserve_pages():
> 
> 1. hugetlb_no_page() allocates a page consuming the reservation,
> resv_huge_pages decrements.

Ok

> 2. remove_inode_hugepages() does remove_huge_page() and
> hugetlb_unreserve_pages(). This removes the entry from the resv_map,
> but does NOT increment back the resv_huge_pages. Because we removed
> the entry, it looks like we have no reservation for this index.
> free_huge_page() gets called on this page, and resv_huge_pages is not
> incremented, I'm not sure why. This page should have come from the
> reserves.

This corresponds to a 'hole punch' operation.  When hugetlbfs hole punch
was added, a design decision was made to not try to restore reservations.
IIRC, this is because the hole punch is associated with the file and
reservations are associated with mappings.  Trying to 'go back' to
associated mappings and determine if a reservation should be restored
would be a difficult exercise.

> 3. hugetlb_mcopy_pte_atomic() gets called for this index. Because of
> the prior call to hugetlb_unreserve_page(), there is no entry in the
> resv_map for this index, which means it looks like we don't have a
> reservation for this index. We allocate a page outside the reserves
> (deferred_reservation=1, HPageRestoreReserve=0), add an entry into
> resv_map, and don't modify resv_huge_pages.

Ok

> 4. The copy fails and we deallocate the page, since
> HPageRestoreReserve==0 for this page, restore_reserve_on_error() does
> nothing.

Yes.  And this may be something we want to fix in the more general case.
i.e. I believe this can happen in code paths other than
hugetlb_mcopy_pte_atomic.  Rather than add special code here, I'll look
into updating restore_reserve_on_error to handle this.  Currently
restore_reserve_on_error only handles the case where HPageRestoreReserve
flags is set.

Thanks for explaining this!  I forgot about this 'special case' where
there is not an entry in the reserve map for a shared mapping.

-- 
Mike Kravetz

> 5. hugetlb_mcopy_pte_atomic() gets recalled with the temporary page,
> and we allocate another page. Now, since we added an entry in the
> resv_map in the previous allocation, it looks like we have a
> reservation for this allocation. We allocate a page with
> deferred_reserve=0 && HPageRestoreReserve=1, we decrement
> resv_huge_pages. Boom, we decremented resv_huge_pages twice for this
> index, never incremented it.
> 
> To fix this, in step 4, when I deallocate a page, I check
> HPageRestoreReserve(page). If HPageRestoreReserve=0, then this
> reservation was consumed and deallocated before, and so I need to
> remove the entry from the resv_map.