lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7e89702d-c52c-4716-9cd6-33aebade1c71@arm.com>
Date: Fri, 3 Jan 2025 13:47:15 +0530
From: Dev Jain <dev.jain@....com>
To: David Hildenbrand <david@...hat.com>, akpm@...ux-foundation.org,
 willy@...radead.org, kirill.shutemov@...ux.intel.com
Cc: ryan.roberts@....com, anshuman.khandual@....com, catalin.marinas@....com,
 cl@...two.org, vbabka@...e.cz, mhocko@...e.com, apopple@...dia.com,
 dave.hansen@...ux.intel.com, will@...nel.org, baohua@...nel.org,
 jack@...e.cz, srivatsa@...il.mit.edu, haowenchao22@...il.com,
 hughd@...gle.com, aneesh.kumar@...nel.org, yang@...amperecomputing.com,
 peterx@...hat.com, ioworker0@...il.com, wangkefeng.wang@...wei.com,
 ziy@...dia.com, jglisse@...gle.com, surenb@...gle.com,
 vishal.moola@...il.com, zokeefe@...gle.com, zhengqi.arch@...edance.com,
 jhubbard@...dia.com, 21cnbao@...il.com, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 09/12] khugepaged: Introduce vma_collapse_anon_folio()


On 02/01/25 5:03 pm, David Hildenbrand wrote:
>>>>
>>>> When having to back-off (restore original PTEs), or for copying,
>>>> you'll likely need access to the original PTEs, which were already
>>>> cleared. So likely you need a temporary copy of the original PTEs
>>>> somehow.
>>>>
>>>> That's why temporarily clearing the PMD und mmap write lock is easier
>>>> to implement, at the cost of requiring the mmap lock in write mode
>>>> like PMD collapse.
>>
>> Why do I need to clear the PMD if I am taking the mmap_write_lock() and
>> operating only on the PTE?
>
> One approach I proposed to Nico (and I think he has a prototype) is:
>
> a) Take all locks like we do today (mmap in write, vma in write, rmap 
> in write)
>
> After this step, no "ordinary" page table walkers can run anymore
>
> b) Clear the PMD entry and flush the TLB like we do today
>
> After this step, neither the CPU can read/write folios nor GUP-fast 
> can run. The PTE table is completely isolated.
>
> c) Now we can work on the (temporarily cleared) PTE table as we 
> please: isolate folios, lock them, ... without clearing the PTE 
> entries, just like we do today.
>
> d) Allocate the new folios (we don't have to hold any spinlocks), copy 
> + replace the affected PTE entries in the isolated PTE table. Similar 
> to what we do today, except that we don't clear PTEs but instead 
> clear+reset.
>
> e) Unlock+un-isolate + unref the collapsed folios like we do today.
>
> f) Re-map the PTE-table, like we do today when collapse would have 
> failed.
>
>
> Of course, after taking all locks we have to re-verify that there is 
> something to collapse (e.g., in d) we also have to check for 
> unexpected folio references). The backup path is easy: remap the PTE 
> table as no PTE entries were touched just yet.
>
> Observe that many things are "like we do today".
>
>
> As soon as we go to read locks + PTE locks, it all gets more 
> complicated to get it right. Not that it cannot be done, but the above 
> is IMHO a lot simpler to get right.

Thanks for the reply. I'll go ahead with the write lock algorithm then.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ