[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e4e02b6b-2bc4-449e-87f9-43f4be269626@redhat.com>
Date: Thu, 2 Jan 2025 12:33:05 +0100
From: David Hildenbrand <david@...hat.com>
To: Dev Jain <dev.jain@....com>, akpm@...ux-foundation.org,
willy@...radead.org, kirill.shutemov@...ux.intel.com
Cc: ryan.roberts@....com, anshuman.khandual@....com, catalin.marinas@....com,
cl@...two.org, vbabka@...e.cz, mhocko@...e.com, apopple@...dia.com,
dave.hansen@...ux.intel.com, will@...nel.org, baohua@...nel.org,
jack@...e.cz, srivatsa@...il.mit.edu, haowenchao22@...il.com,
hughd@...gle.com, aneesh.kumar@...nel.org, yang@...amperecomputing.com,
peterx@...hat.com, ioworker0@...il.com, wangkefeng.wang@...wei.com,
ziy@...dia.com, jglisse@...gle.com, surenb@...gle.com,
vishal.moola@...il.com, zokeefe@...gle.com, zhengqi.arch@...edance.com,
jhubbard@...dia.com, 21cnbao@...il.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 09/12] khugepaged: Introduce vma_collapse_anon_folio()
>>>
>>> When having to back-off (restore original PTEs), or for copying,
>>> you'll likely need access to the original PTEs, which were already
>>> cleared. So likely you need a temporary copy of the original PTEs
>>> somehow.
>>>
>>> That's why temporarily clearing the PMD und mmap write lock is easier
>>> to implement, at the cost of requiring the mmap lock in write mode
>>> like PMD collapse.
>
> Why do I need to clear the PMD if I am taking the mmap_write_lock() and
> operating only on the PTE?
One approach I proposed to Nico (and I think he has a prototype) is:
a) Take all locks like we do today (mmap in write, vma in write, rmap in
write)
After this step, no "ordinary" page table walkers can run anymore
b) Clear the PMD entry and flush the TLB like we do today
After this step, neither the CPU can read/write folios nor GUP-fast can
run. The PTE table is completely isolated.
c) Now we can work on the (temporarily cleared) PTE table as we please:
isolate folios, lock them, ... without clearing the PTE entries, just
like we do today.
d) Allocate the new folios (we don't have to hold any spinlocks), copy +
replace the affected PTE entries in the isolated PTE table. Similar to
what we do today, except that we don't clear PTEs but instead clear+reset.
e) Unlock+un-isolate + unref the collapsed folios like we do today.
f) Re-map the PTE-table, like we do today when collapse would have failed.
Of course, after taking all locks we have to re-verify that there is
something to collapse (e.g., in d) we also have to check for unexpected
folio references). The backup path is easy: remap the PTE table as no
PTE entries were touched just yet.
Observe that many things are "like we do today".
As soon as we go to read locks + PTE locks, it all gets more complicated
to get it right. Not that it cannot be done, but the above is IMHO a lot
simpler to get right.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists