lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8cae1e56-239f-4f67-a18c-b4f4d09f40d0@redhat.com>
Date: Mon, 3 Mar 2025 21:49:48 +0100
From: David Hildenbrand <david@...hat.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 Peter Xu <peterx@...hat.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
 Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org,
 Matthew Wilcox <willy@...radead.org>, Olivier Dion <odion@...icios.com>,
 linux-mm@...ck.org
Subject: Re: [RFC PATCH 0/2] SKSM: Synchronous Kernel Samepage Merging

On 03.03.25 21:01, Mathieu Desnoyers wrote:
> On 2025-02-28 17:32, Peter Xu wrote:
>> On Fri, Feb 28, 2025 at 12:53:02PM -0500, Mathieu Desnoyers wrote:
>>> On 2025-02-28 11:32, Peter Xu wrote:
>>>> On Fri, Feb 28, 2025 at 09:59:00AM -0500, Mathieu Desnoyers wrote:
>>>>> For the VM use-case, I wonder if we could just add a userfaultfd
>>>>> "COW" event that would notify userspace when a COW happens ?
>>>>
>>>> I don't know what's the best for KSM and how well this will work, but we
>>>> have such event for years..  See UFFDIO_REGISTER_MODE_WP:
>>>>
>>>> https://man7.org/linux/man-pages/man2/userfaultfd.2.html
>>>
>>> userfaultfd UFFDIO_REGISTER only seems to work if I pass an address
>>> resulting from a mmap mapping, but returns EINVAL if I pass a
>>> page-aligned address which sits within a private file mapping
>>> (e.g. executable data).
>>
>> Yes, so far sync traps only supports RAM-based file systems, or anonymous.
>> Generic private file mappings (that stores executables and libraries) are
>> not yet supported.
>>
>>>
>>> Also, I notice that do_wp_page() only calls handle_userfault
>>> VM_UFFD_WP when vm_fault flags does not have FAULT_FLAG_UNSHARE
>>> set.
>>
>> AFAICT that's expected, unshare should only be set on reads, never writes.
>> So uffd-wp shouldn't trap any of those.
>>
>>>
>>> AFAIU, as it stands now userfaultfd would not help tracking COW faults
>>> caused by stores to private file mappings. Am I missing something ?
>>
>> I think you're right.  So we have UFFD_FEATURE_WP_ASYNC that should work on
>> most mappings.  That one is async, though, so more like soft-dirty.  It
>> might be doable to try making it sync too without a lot of changes based on
>> how async tracking works.
> 
> I'm looking more closely at admin-guide/mm/pagemap.rst and it appears to
> be a good fit. Here is what I have in mind to replace the ksmd scanning
> thread for the VM use-case by a purely user-space driven scanning:
> 
> Within qemu or similar user-space process:
> 
> 1) Track guest memory with the userfaultfd UFFD_FEATURE_WP_ASYNC feature and
>      UFFDIO_REGISTER_MODE_WP mode.
> 
> 2) Protect user-space memory with the PAGEMAP_SCAN ioctl PM_SCAN_WP_MATCHING flag
>      to detect memory which stays invariant for a long time.
> 
> 3) Use the PAGEMAP_SCAN ioctl with PAGE_IS_WRITTEN to detect which pages are written to.
>      Keep track of memory which is frequently modified, so it can be left alone and
>      not write-protected nor merged anymore.
> 
> 4) Whenever pages stay invariant for a given lapse of time, merge them with the new
>      madvise(2) KSM_MERGE behavior.
> 
> Let me know if that makes sense.

Note that one of the strengths of ksm in the kernel right now is that we 
write-protect + try-deduplicate only when we are fairly sure that we can 
deduplicate (unstable tree), and that the interaction with THPs / large 
folios is fairly well thought-through.

Also note that, just because data hasn't been written in some time 
interval, doesn't mean that it should be deduplicated and result in CoW 
on next write access.

One probably would have to mimic what the KSM implementation in the 
kernel does, and built something like the unstable tree, to find 
candidates where we can actually deduplciate. Then, have a way to 
not-deduplicate if the content changed.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ