linux-kernel - Re: [PATCH] mm: migration :shared anonymous migration test is failing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <454f7c85-a6d9-4635-94ff-7fab69060a82@redhat.com>
Date: Fri, 20 Dec 2024 11:11:55 +0100
From: David Hildenbrand <david@...hat.com>
To: Donet Tom <donettom@...ux.ibm.com>,
 Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Cc: Ritesh Harjani <ritesh.list@...il.com>,
 Baolin Wang <baolin.wang@...ux.alibaba.com>,
 "Aneesh Kumar K . V" <aneesh.kumar@...nel.org>, Zi Yan <ziy@...dia.com>,
 shuah Khan <shuah@...nel.org>, Dev Jain <dev.jain@....com>
Subject: Re: [PATCH] mm: migration :shared anonymous migration test is failing

On 20.12.24 03:55, Donet Tom wrote:
> 
> On 12/19/24 18:28, David Hildenbrand wrote:
>> On 19.12.24 13:47, Donet Tom wrote:
>>> The migration selftest is currently failing for shared anonymous
>>> mappings due to a race condition.
>>>
>>> During migration, the source folio's PTE is unmapped by nuking the
>>> PTE, flushing the TLB,and then marking the page for migration
>>> (by creating the swap entries). The issue arises when, immediately
>>> after the PTE is nuked and the TLB is flushed, but before the page
>>> is marked for migration, another thread accesses the page. This
>>> triggers a page fault, and the page fault handler invokes
>>> do_pte_missing() instead of do_swap_page(), as the page is not yet
>>> marked for migration.
>>>
>>> In the fault handling path, do_pte_missing() calls __do_fault()
>>> ->shmem_fault() -> shmem_get_folio_gfp() -> filemap_get_entry().
>>> This eventually calls folio_try_get(), incrementing the reference
>>> count of the folio undergoing migration. The thread then blocks
>>> on folio_lock(), as the migration path holds the lock. This
>>> results in the migration failing in __migrate_folio(), which expects
>>> the folio's reference count to be 2. However, the reference count is
>>> incremented by the fault handler, leading to the failure.
>>>
>>> The issue arises because, after nuking the PTE and before marking the
>>> page for migration, the page is accessed. To address this, we have
>>> updated the logic to first nuke the PTE, then mark the page for
>>> migration, and only then flush the TLB. With this patch, If the page is
>>> accessed immediately after nuking the PTE, the TLB entry is still
>>> valid, so no fault occurs.
>>
>> But what about if the PTE is not in the TLB yet, and you get an access
>> from another CPU just after clearing the PTE (but not flushing the
>> TLB)? The other CPU will still observe PTE=none, trigger a fault etc.
>>
> Yes, in this scenario, the migration will fail. Do you think the
> migration test
> failure, even after a retry, should be considered a major issue that
> must be fixed?

I think it is something we should definitely improve, but I think our 
page migration should handle this in a better way, not the unmap logic. 
I recall we discussed with Dev some ideas on how to improve that?

I'm pretty sure one can trigger similar case using a tmpfs file and 
using read/write in a loop instead of memory access -> page faults. So 
where racing with page faults is completely out of the picture.

-- 
Cheers,

David / dhildenb