lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2fc40750-2ff7-4e73-ba52-c4d9caaa4f4f@redhat.com>
Date: Tue, 21 Jan 2025 08:58:41 +0100
From: David Hildenbrand <david@...hat.com>
To: Miaohe Lin <linmiaohe@...wei.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 akpm@...ux-foundation.org, osalvador@...e.de, nao.horiguchi@...il.com,
 mhocko@...e.com, Wupeng Ma <mawupeng1@...wei.com>
Subject: Re: [PATCH v2 1/3] mm: memory-failure: update ttu flag inside
 unmap_poisoned_folio

On 21.01.25 04:20, Miaohe Lin wrote:
> On 2025/1/20 16:46, David Hildenbrand wrote:
>> On 20.01.25 08:49, David Hildenbrand wrote:
>>>
>>>>>         if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
>>>>>             struct address_space *mapping;
>>>>>     @@ -1572,7 +1598,7 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
>>>>>             if (!mapping) {
>>>>>                 pr_info("%#lx: could not lock mapping for mapped hugetlb folio\n",
>>>>>                     folio_pfn(folio));
>>>>> -            return;
>>>>> +            return -EBUSY;
>>>>>             }
>>>>>                try_to_unmap(folio, ttu|TTU_RMAP_LOCKED);
>>>>> @@ -1580,6 +1606,8 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
>>>>>         } else {
>>>>>             try_to_unmap(folio, ttu);
>>>>>         }
>>>>> +
>>>>> +    return folio_mapped(folio) ? -EBUSY : 0;
>>>>
>>>> Do we really need this return value? It's unused in do_migrate_range().
>>>
>>> I suggested it, because the folio_mapped() is nowadays extremely cheap.
>>> It cleans up hwpoison_user_mappings() quite nicely.
>>
>> I'm also wondering, if in do_migrate_range(), we want to pr_warn_ratelimit() in case still mapped after the call. IIUC, we don't really expect this to happen with SYNC set.
> 
> Do you mean TTU_SYNC? It seems it's not set.

With your patch it will be now, which is the right thing to do I think.

> 
> There might be a race will hit the proposed pr_warn_ratelimit():
> 
> /* Assume folio is isolated for reclaim, so memory_failure failed to handle it at first time. Then it's put back to LRU. */
> do_migrate_range
>   folio_test_hwpoison
>    folio_mapped
>    <folio is isolated for reclaim again.>
>     unmap_poisoned_folio
>    <folio is put buck.>
>      pr_warn_ratelimit(folio_mapped)
> 
> But I might be miss something. And even this race is possible, it should be really hard to hit.

Does try_to_unmap() care about isolation? Skimming over the code, I 
don't think so. I assume once we take the folio lock, races with reclaim 
are impossible.

In any case, the race is unexpected, so pr_warn_() would be helpful and 
not harmful.

Memory offlining code will later simply skip all PageHWPoison() pages, 
independent of the refcount as it seems. Failing to unmap might not be 
handled correctly at all ... I think this might be problematic in other 
regard (e.g., GUP references), but failing to unmap is "obviously" bad I 
think :)

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ