[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7e89702d-c52c-4716-9cd6-33aebade1c71@arm.com>
Date: Fri, 3 Jan 2025 13:47:15 +0530
From: Dev Jain <dev.jain@....com>
To: David Hildenbrand <david@...hat.com>, akpm@...ux-foundation.org,
willy@...radead.org, kirill.shutemov@...ux.intel.com
Cc: ryan.roberts@....com, anshuman.khandual@....com, catalin.marinas@....com,
cl@...two.org, vbabka@...e.cz, mhocko@...e.com, apopple@...dia.com,
dave.hansen@...ux.intel.com, will@...nel.org, baohua@...nel.org,
jack@...e.cz, srivatsa@...il.mit.edu, haowenchao22@...il.com,
hughd@...gle.com, aneesh.kumar@...nel.org, yang@...amperecomputing.com,
peterx@...hat.com, ioworker0@...il.com, wangkefeng.wang@...wei.com,
ziy@...dia.com, jglisse@...gle.com, surenb@...gle.com,
vishal.moola@...il.com, zokeefe@...gle.com, zhengqi.arch@...edance.com,
jhubbard@...dia.com, 21cnbao@...il.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 09/12] khugepaged: Introduce vma_collapse_anon_folio()
On 02/01/25 5:03 pm, David Hildenbrand wrote:
>>>>
>>>> When having to back-off (restore original PTEs), or for copying,
>>>> you'll likely need access to the original PTEs, which were already
>>>> cleared. So likely you need a temporary copy of the original PTEs
>>>> somehow.
>>>>
>>>> That's why temporarily clearing the PMD und mmap write lock is easier
>>>> to implement, at the cost of requiring the mmap lock in write mode
>>>> like PMD collapse.
>>
>> Why do I need to clear the PMD if I am taking the mmap_write_lock() and
>> operating only on the PTE?
>
> One approach I proposed to Nico (and I think he has a prototype) is:
>
> a) Take all locks like we do today (mmap in write, vma in write, rmap
> in write)
>
> After this step, no "ordinary" page table walkers can run anymore
>
> b) Clear the PMD entry and flush the TLB like we do today
>
> After this step, neither the CPU can read/write folios nor GUP-fast
> can run. The PTE table is completely isolated.
>
> c) Now we can work on the (temporarily cleared) PTE table as we
> please: isolate folios, lock them, ... without clearing the PTE
> entries, just like we do today.
>
> d) Allocate the new folios (we don't have to hold any spinlocks), copy
> + replace the affected PTE entries in the isolated PTE table. Similar
> to what we do today, except that we don't clear PTEs but instead
> clear+reset.
>
> e) Unlock+un-isolate + unref the collapsed folios like we do today.
>
> f) Re-map the PTE-table, like we do today when collapse would have
> failed.
>
>
> Of course, after taking all locks we have to re-verify that there is
> something to collapse (e.g., in d) we also have to check for
> unexpected folio references). The backup path is easy: remap the PTE
> table as no PTE entries were touched just yet.
>
> Observe that many things are "like we do today".
>
>
> As soon as we go to read locks + PTE locks, it all gets more
> complicated to get it right. Not that it cannot be done, but the above
> is IMHO a lot simpler to get right.
Thanks for the reply. I'll go ahead with the write lock algorithm then.
Powered by blists - more mailing lists