lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e4e02b6b-2bc4-449e-87f9-43f4be269626@redhat.com>
Date: Thu, 2 Jan 2025 12:33:05 +0100
From: David Hildenbrand <david@...hat.com>
To: Dev Jain <dev.jain@....com>, akpm@...ux-foundation.org,
 willy@...radead.org, kirill.shutemov@...ux.intel.com
Cc: ryan.roberts@....com, anshuman.khandual@....com, catalin.marinas@....com,
 cl@...two.org, vbabka@...e.cz, mhocko@...e.com, apopple@...dia.com,
 dave.hansen@...ux.intel.com, will@...nel.org, baohua@...nel.org,
 jack@...e.cz, srivatsa@...il.mit.edu, haowenchao22@...il.com,
 hughd@...gle.com, aneesh.kumar@...nel.org, yang@...amperecomputing.com,
 peterx@...hat.com, ioworker0@...il.com, wangkefeng.wang@...wei.com,
 ziy@...dia.com, jglisse@...gle.com, surenb@...gle.com,
 vishal.moola@...il.com, zokeefe@...gle.com, zhengqi.arch@...edance.com,
 jhubbard@...dia.com, 21cnbao@...il.com, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 09/12] khugepaged: Introduce vma_collapse_anon_folio()

>>>
>>> When having to back-off (restore original PTEs), or for copying,
>>> you'll likely need access to the original PTEs, which were already
>>> cleared. So likely you need a temporary copy of the original PTEs
>>> somehow.
>>>
>>> That's why temporarily clearing the PMD und mmap write lock is easier
>>> to implement, at the cost of requiring the mmap lock in write mode
>>> like PMD collapse.
> 
> Why do I need to clear the PMD if I am taking the mmap_write_lock() and
> operating only on the PTE?

One approach I proposed to Nico (and I think he has a prototype) is:

a) Take all locks like we do today (mmap in write, vma in write, rmap in 
write)

After this step, no "ordinary" page table walkers can run anymore

b) Clear the PMD entry and flush the TLB like we do today

After this step, neither the CPU can read/write folios nor GUP-fast can 
run. The PTE table is completely isolated.

c) Now we can work on the (temporarily cleared) PTE table as we please: 
isolate folios, lock them, ... without clearing the PTE entries, just 
like we do today.

d) Allocate the new folios (we don't have to hold any spinlocks), copy + 
replace the affected PTE entries in the isolated PTE table. Similar to 
what we do today, except that we don't clear PTEs but instead clear+reset.

e) Unlock+un-isolate + unref the collapsed folios like we do today.

f) Re-map the PTE-table, like we do today when collapse would have failed.


Of course, after taking all locks we have to re-verify that there is 
something to collapse (e.g., in d) we also have to check for unexpected 
folio references). The backup path is easy: remap the PTE table as no 
PTE entries were touched just yet.

Observe that many things are "like we do today".


As soon as we go to read locks + PTE locks, it all gets more complicated 
to get it right. Not that it cannot be done, but the above is IMHO a lot 
simpler to get right.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ