[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <eebe218a-f381-3a24-36c5-e1472b91a1ce@huawei.com>
Date: Wed, 22 Jan 2025 15:38:56 +0800
From: Miaohe Lin <linmiaohe@...wei.com>
To: David Hildenbrand <david@...hat.com>
CC: <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
<akpm@...ux-foundation.org>, <osalvador@...e.de>, <nao.horiguchi@...il.com>,
<mhocko@...e.com>, Wupeng Ma <mawupeng1@...wei.com>
Subject: Re: [PATCH v2 1/3] mm: memory-failure: update ttu flag inside
unmap_poisoned_folio
On 2025/1/21 15:58, David Hildenbrand wrote:
> On 21.01.25 04:20, Miaohe Lin wrote:
>> On 2025/1/20 16:46, David Hildenbrand wrote:
>>> On 20.01.25 08:49, David Hildenbrand wrote:
>>>>
>>>>>> if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
>>>>>> struct address_space *mapping;
>>>>>> @@ -1572,7 +1598,7 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
>>>>>> if (!mapping) {
>>>>>> pr_info("%#lx: could not lock mapping for mapped hugetlb folio\n",
>>>>>> folio_pfn(folio));
>>>>>> - return;
>>>>>> + return -EBUSY;
>>>>>> }
>>>>>> try_to_unmap(folio, ttu|TTU_RMAP_LOCKED);
>>>>>> @@ -1580,6 +1606,8 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
>>>>>> } else {
>>>>>> try_to_unmap(folio, ttu);
>>>>>> }
>>>>>> +
>>>>>> + return folio_mapped(folio) ? -EBUSY : 0;
>>>>>
>>>>> Do we really need this return value? It's unused in do_migrate_range().
>>>>
>>>> I suggested it, because the folio_mapped() is nowadays extremely cheap.
>>>> It cleans up hwpoison_user_mappings() quite nicely.
>>>
>>> I'm also wondering, if in do_migrate_range(), we want to pr_warn_ratelimit() in case still mapped after the call. IIUC, we don't really expect this to happen with SYNC set.
>>
>> Do you mean TTU_SYNC? It seems it's not set.
>
> With your patch it will be now, which is the right thing to do I think.
>
>>
>> There might be a race will hit the proposed pr_warn_ratelimit():
>>
>> /* Assume folio is isolated for reclaim, so memory_failure failed to handle it at first time. Then it's put back to LRU. */
>> do_migrate_range
>> folio_test_hwpoison
>> folio_mapped
>> <folio is isolated for reclaim again.>
>> unmap_poisoned_folio
>> <folio is put buck.>
>> pr_warn_ratelimit(folio_mapped)
>>
>> But I might be miss something. And even this race is possible, it should be really hard to hit.
>
> Does try_to_unmap() care about isolation? Skimming over the code, I don't think so. I assume once we take the folio lock, races with reclaim are impossible.
I think you're right. I missed folio lock in above race.
>
> In any case, the race is unexpected, so pr_warn_() would be helpful and not harmful.
>
> Memory offlining code will later simply skip all PageHWPoison() pages, independent of the refcount as it seems. Failing to unmap might not be handled correctly at all ... I think this might be problematic in other regard (e.g., GUP references), but failing to unmap is "obviously" bad I think :)
Agree with you.
Thanks.
.
Powered by blists - more mailing lists