linux-kernel - Re: [PATCH] mm/memory_hotplug: avoid consuming corrupted data when offline pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3613e166-41a9-62b2-81a3-c775b531a9c7@huawei.com>
Date:   Tue, 26 Apr 2022 19:41:52 +0800
From:   Miaohe Lin <linmiaohe@...wei.com>
To:     HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>
CC:     Mike Kravetz <mike.kravetz@...cle.com>,
        David Hildenbrand <david@...hat.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Oscar Salvador <osalvador@...e.de>
Subject: Re: [PATCH] mm/memory_hotplug: avoid consuming corrupted data when
 offline pages

On 2022/4/25 21:55, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Fri, Apr 22, 2022 at 02:53:48PM +0800, Miaohe Lin wrote:
>> On 2022/4/22 6:04, Mike Kravetz wrote:
>>> On 4/21/22 07:20, David Hildenbrand wrote:
>>>> On 21.04.22 15:51, Miaohe Lin wrote:
>>>>> When trying to offline pages, HWPoisoned hugepage is migrated without
>>>>> checking PageHWPoison first. So corrupted data could be consumed. Fix
>>>>> it by deferring isolate_huge_page until PageHWPoison is handled.
>>>>>
>>>>
>>>> CCing Oscar, Mike and Naoya
>>>>
>>>>> Signed-off-by: Miaohe Lin <linmiaohe@...wei.com>
>>>>> ---
>>>>>  mm/memory_hotplug.c | 11 +++++++----
>>>>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>>> index 4c6065e5d274..093f85ec5c5c 100644
>>>>> --- a/mm/memory_hotplug.c
>>>>> +++ b/mm/memory_hotplug.c
>>>>> @@ -1600,11 +1600,9 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>>>  		folio = page_folio(page);
>>>>>  		head = &folio->page;
>>>>>  
>>>>> -		if (PageHuge(page)) {
>>>>> +		if (PageHuge(page))
>>>>>  			pfn = page_to_pfn(head) + compound_nr(head) - 1;
>>>>> -			isolate_huge_page(head, &source);
>>>>> -			continue;
>>>>> -		} else if (PageTransHuge(page))
>>>>> +		else if (PageTransHuge(page))
>>>>>  			pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>>>>>  
>>>>>  		/*
>>>>> @@ -1622,6 +1620,11 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>>>  			continue;
>>>>>  		}
>>>>>  
>>>>> +		if (PageHuge(page)) {
>>>>> +			isolate_huge_page(head, &source);
>>>>> +			continue;
>>>>> +		}
>>>>> +
>>>>>  		if (!get_page_unless_zero(page))
>>>>>  			continue;
>>>>>  		/*
>>>>
>>>> The problem statement makes sense to me but I am not sure about the
>>>> details if we run into the "PageHWPoison" path with a huge page. I have
>>>> the gut feeling that we have to do more for huge pages in the
>>>> PageHWPoison() path, because we might be dealing with a free huge page
>>>> after unmap succeeds. I might be wrong.
>>>>
>>>
>>> Thinking about memory errors always makes my head hurt :)
>>
>> Me too. ;)
>>
>>>
>>> What 'state' could a poisoned hugetlb page be in here?
>>> - Mapped into a process address space?
>>> - On the hugetlb free lists?
> 
> There seems at least 2 cases where a hwpoisoned hugetlb page keeps undissolved.
>   (1) Hwpoisoned hugepage in free hugepage list
>   (2) Hwpoisoned hugepage belonging to a hugetlbfs file
> 
>>>
>>> IIUC, when poisoning a hugetlb page we try to dissolve those that are
>>> free and unmap those which are mapped.  So, this means those operations
>>> must have failed for some reason.  Is that correct?
> 
> Yes, (1) is made by failure in error handling, but (2) can be made even
> when memory error on in-use hugepage is successfully handled.

Could you please explain 2 more detailed? IIUC, in-use hugepage belonging to a hugetlbfs
file will be removed from pagecache via hugetlbfs_error_remove_page. So when successfully
handled, hugepage->mapping will be NULL and thus not belonging to a hugetlbfs file. Or am
I miss something?

> 
>>
>> I think you're right.
>>
>> BTW: Now that PageHWPoison is set under hugetlb_lock and HPageFreed is also checked
>> under that lock, it should be more likely to handle the hugetlb page on the free lists
>> successfully because the possible race with hugetlb page allocation or demotion can be
>> prevented by PageHWPoison flag. So it is more likely to be a mapped hugetlb page here.
>> But that doesn't matter.
>>
>>>
>>> Now, IF the above is true this implies there is a poisoned page somewhere
>>> within the hugetlb page.  But, poison is only marked in the head page.
>>> So, we do not really know where within the page the actual error exists.
>>> Is my reasoning still correct?
>>>
>>
>> IIRC, Naoya once proposed to store the poisoned page info into subpage field.
> 
> Yes, it's important to keep the offset of raw page.  I have a "storing raw
> error info" patch for this, but it's still immature to submit.
> 
>>
>>> If my reasoning is correct, then I am not sure what we can do here.
>>> The code to handle poison is:
>>>
>>>                  if (PageHWPoison(page)) {
>>>                         if (WARN_ON(folio_test_lru(folio)))
>>>                                 folio_isolate_lru(folio);
>>>                         if (folio_mapped(folio))
>>>                                 try_to_unmap(folio, TTU_IGNORE_MLOCK);
>>>                         continue;
>>>                 }
>>>
>>> My first observation is that if a hugetlb page is passed to try_to_unmap
>>> as above we may BUG.  This is because we need to hold i_mmap_rwsem in
>>> write mode because of the possibility of calling huge_pmd_unshare.  :(
>>
>> I'm sorry. I missed that case. :(
> 
> OK, so the above check should not be called for hugetlb pages.
> 
>>
>>>
>>> I 'think' try_to_unmap could succeed on a poisoned hugetlb page once we
>>> add correct locking.  So, the page would be unmapped.  I do not think anyone
>>> is holding a ref, so the last unmap should put the hugetlb page on the
>>> free list.  Correct?  We will not migrate the page, but ...
> 
> That seems depends on whether the hugepage is anonymous or file.
> If we consider anonymous hugepage, the above statement is correct.
> 
> As for what we can do, for (1) we can simply skip the hwpoisoned hugepage
> in do_migrate_range() (i.e. no need to call isolate_huge_page()), because in
> the outer while-loop in offline_pages() we call dissolve_free_huge_pages()
> so free hugepage can be handled here.  The above "storing raw error info"
> patch will also enable dissolve_free_huge_pages() to handle hwpoisoned free
> hugepage (by moving PG_hwpoison flag to the raw error page), so I assume
> to have the patch.

Yes, I think (1) will be fixed this way.

> 
> As for (2), I think that making such hugepages unmovable (by clearing
> HPageMigratable) could work.  With that change, scan_movable_pages() detects
> unmovable pages so do_migrate_range is not called for the hugepages.
> Moreover, scan_movable_pages() should return -EBUSY if an unmovable page is found,
> so offline_pages() fails without trying to touch the hwpoisoned hugepage.

Yes, it seems work but this will lead to the memory offline fails. Could this be too overkill?

> Anyway the reported problem (try to migrate hwpoisoned hugepage) would to be solved.
> 
> # Maybe we might optionally try to free the hwpoisoned hugepage by removing
> # the affected hugetlbfs file, but it's not unclear to me taht that's acceptable
> # (a file is gone when doing memory hotremove?).
> 
>>
>> Might hugetlb page still be in the pagecache? But yes, the last unmap could put
>> the hugetlb page on the free list.
>>
>>>
>>> After the call to do_migrate_range() in offline_pages, we will call
>>> dissolve_free_huge_pages.  For each hugetlb page, dissolve_free_huge_pages
>>> will call dissolve_free_huge_page likely passing in the 'head' page.
>>> When dissolve_free_huge_page is called for poisoned hugetlb pages from
>>> the memory error handling code, it passes in the sub page which contains
>>> the memory error.  Before freeing the hugetlb page to buddy, there is this
>>> code:
>>>
>>>                         /*
>>>                          * Move PageHWPoison flag from head page to the raw
>>>                          * error page, which makes any subpages rather than
>>>                          * the error page reusable.
>>>                          */
>>>                         if (PageHWPoison(head) && page != head) {
>>>                                 SetPageHWPoison(page);
>>>                                 ClearPageHWPoison(head);
>>>                         }
>>>                         update_and_free_page(h, head, false)
>>>
>>> As previously mentioned, outside the memory error handling code we do
>>> not know what page within the hugetlb page contains poison.  So, this
>>> will likely result with a page with errors on the free list and an OK
>>> page marked with error.
>>
>> Yes, this is possible. It's really a pity. Could we just reject to dissolve the
>> hugetlb page and keep the hugetlb page around? This will waste some healthy subpages
>> but can do the right things?
> 
> Could you let me try writing patches in my approach?
> I'll try to submit the fix with the "storing raw error info" patch.

Sure, I will wait for that. Many thanks for your hard work! :)

> 
>>
>>>
>>> In other places, we 'bail' if we encounter a hugetlb page with poison.
>>> It would be unfortunate to prevent memory offline if the range contains
>>> a hugetlb page with poison.  After all, offlining a section with poison
>>> make sense.
>>>
>>
>> IIUC, a (hugetlb) page with poison could be offlined anytime even if it's still in the
>> pagecache, swapcache, i.e. still be referenced by somewhere, because memory offline will
>> just offline the poisoned page while ignoring the page refcnt totally. I can't figure
>> out a easy way to fix it yet. :(
> 
> Yes, that's finally expected behavior, but unlike normal pagecache we need
> pay extra attention for hugetlbfs file because it's not storage-backed.
> 
> Thank you for valuable comments, everyone.

BTW: There is another possible problem about hwpoison. Think about the below scene:

offline_pages
do {
  scan_movable_pages
    __PageMovable /* Page is also hwpoisoned. */
      goto found; /* So ret is always 0. */
  do_migrate_range
     if (PageHWPoison(page)) /* PageMovable hwpoisoned page is not handled. */
        continue;
 } while (!ret); /* We will meet that page again and again. */

So we might busy-loop until a signal is pending. Or am I miss something? And I think this could be fixed by
deferring set hwpoisoned flag until page refcnt is successfully incremented for normal page in memory_failure()
which is already in your to-do list. So this possible issue could be left alone now?

Thanks!

> 
> - Naoya Horiguchi
>