linux-kernel - Re: [QUESTION] anon_vma lock in khugepaged

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <ea15f3d3-5dd8-4404-8dab-5673bb5f3413@arm.com>
Date: Thu, 5 Dec 2024 15:40:08 +0530
From: Dev Jain <dev.jain@....com>
To: ryan.roberts@....com, david@...hat.com, kirill.shutemov@...ux.intel.com,
 willy@...radead.org, ziy@...dia.com, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org
Subject: Re: [QUESTION] anon_vma lock in khugepaged


On 28/11/24 11:56 am, Dev Jain wrote:
> Hi, I was looking at khugepaged code and I cannot figure out what will the problem be
> if we take the mmap lock in read mode. Shouldn't just taking the PMD lock, then PTL,
> then unlocking PTL, then unlocking PMD, solve any races with page table walkers?
>
>

Similar questions:

1. Why do we need anon_vma_lock_write() in collapse_huge_page()? AFAIK we need to walk anon_vma's either
    when we are forking or when we are unmapping a folio and need to find all VMAs mapping it; the former path takes the
    mmap_write_lock() and so we have no problem, and for the latter, if we just had anon_vma_lock_read(), then it
    may happen that kswapd isolates folio from LRU, and traverses rmap and swaps the folio out and khugepaged fails in
    folio_isolate_lru(), but then that is not a fatal problem but just a performance degradation due to a race (wherein
    the entire code is racy anyways). What am I missing?

2. In what all scenarios does rmap come into play? Fork, swapping out, any other I am missing?

3. Please confirm the correctness: In stark contrast to page migration, we do not need to do rmap walk and nuke all
    PTEs referencing the folio, because for anon non-shmem folios, the only way the folio can be shared is forking,
    and, if that is the case, folio_put() will not release the folio in __collapse_huge_page_copy_succeeded() -> free_page_and_swap_cache(),
    so the old folio is still there and child processes can read from it. Page migration requires that we are able
    to deallocate the old folios.