[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2fc40750-2ff7-4e73-ba52-c4d9caaa4f4f@redhat.com>
Date: Tue, 21 Jan 2025 08:58:41 +0100
From: David Hildenbrand <david@...hat.com>
To: Miaohe Lin <linmiaohe@...wei.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
akpm@...ux-foundation.org, osalvador@...e.de, nao.horiguchi@...il.com,
mhocko@...e.com, Wupeng Ma <mawupeng1@...wei.com>
Subject: Re: [PATCH v2 1/3] mm: memory-failure: update ttu flag inside
unmap_poisoned_folio
On 21.01.25 04:20, Miaohe Lin wrote:
> On 2025/1/20 16:46, David Hildenbrand wrote:
>> On 20.01.25 08:49, David Hildenbrand wrote:
>>>
>>>>> if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
>>>>> struct address_space *mapping;
>>>>> @@ -1572,7 +1598,7 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
>>>>> if (!mapping) {
>>>>> pr_info("%#lx: could not lock mapping for mapped hugetlb folio\n",
>>>>> folio_pfn(folio));
>>>>> - return;
>>>>> + return -EBUSY;
>>>>> }
>>>>> try_to_unmap(folio, ttu|TTU_RMAP_LOCKED);
>>>>> @@ -1580,6 +1606,8 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
>>>>> } else {
>>>>> try_to_unmap(folio, ttu);
>>>>> }
>>>>> +
>>>>> + return folio_mapped(folio) ? -EBUSY : 0;
>>>>
>>>> Do we really need this return value? It's unused in do_migrate_range().
>>>
>>> I suggested it, because the folio_mapped() is nowadays extremely cheap.
>>> It cleans up hwpoison_user_mappings() quite nicely.
>>
>> I'm also wondering, if in do_migrate_range(), we want to pr_warn_ratelimit() in case still mapped after the call. IIUC, we don't really expect this to happen with SYNC set.
>
> Do you mean TTU_SYNC? It seems it's not set.
With your patch it will be now, which is the right thing to do I think.
>
> There might be a race will hit the proposed pr_warn_ratelimit():
>
> /* Assume folio is isolated for reclaim, so memory_failure failed to handle it at first time. Then it's put back to LRU. */
> do_migrate_range
> folio_test_hwpoison
> folio_mapped
> <folio is isolated for reclaim again.>
> unmap_poisoned_folio
> <folio is put buck.>
> pr_warn_ratelimit(folio_mapped)
>
> But I might be miss something. And even this race is possible, it should be really hard to hit.
Does try_to_unmap() care about isolation? Skimming over the code, I
don't think so. I assume once we take the folio lock, races with reclaim
are impossible.
In any case, the race is unexpected, so pr_warn_() would be helpful and
not harmful.
Memory offlining code will later simply skip all PageHWPoison() pages,
independent of the refcount as it seems. Failing to unmap might not be
handled correctly at all ... I think this might be problematic in other
regard (e.g., GUP references), but failing to unmap is "obviously" bad I
think :)
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists