linux-kernel - Re: [PATCH 0/6] mm: split underutilized THPs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <907d1de9-7a38-4b6a-ba1f-eafa1b4c2783@redhat.com>
Date: Tue, 6 Aug 2024 19:33:26 +0200
From: David Hildenbrand <david@...hat.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Usama Arif <usamaarif642@...il.com>, akpm@...ux-foundation.org,
 linux-mm@...ck.org, riel@...riel.com, shakeel.butt@...ux.dev,
 roman.gushchin@...ux.dev, yuzhao@...gle.com, baohua@...nel.org,
 ryan.roberts@....com, rppt@...nel.org, willy@...radead.org,
 cerasuolodomenico@...il.com, corbet@....net, linux-kernel@...r.kernel.org,
 linux-doc@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH 0/6] mm: split underutilized THPs

On 06.08.24 19:28, Johannes Weiner wrote:
> On Thu, Aug 01, 2024 at 08:36:32AM +0200, David Hildenbrand wrote:
>> I just added another printf to postcopy_ram_supported_by_host(), where
>> we temporarily do a UFFDIO_REGISTER on some test area.
>>
>> Sensing UFFD support # postcopy_ram_supported_by_host()
>> Sensing UFFD support
>> Writing received pages during precopy # ram_load_precopy()
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Disabling THP: MADV_NOHUGEPAGE # postcopy_ram_prepare_discard()
>> Discarding pages # loadvm_postcopy_ram_handle_discard()
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Registering UFFD # postcopy_ram_incoming_setup()
>>
>> We could think about using this "ever user uffd" to avoid the shared
>> zeropage in most processes.
>>
>> Of course, there might be other applications where that wouldn't work,
>> but I think this behavior (write to area before enabling uffd) might be
>> fairly QEMU specific already.
> 
> It makes me a bit uneasy to hardcode this into the kernel. It's fairly
> specific to qemu/criu, and won't protect usecases that behave slightly
> differently.
> 
> It would also give userfaultfd users that aren't susceptible to this
> particular scenario a different code path.

True, but let's be honest, it's not like there are a million userfaultfd 
users out there :)

Anyhow ...

> 
>> Avoiding the shared zeropage has the benefit that a later write fault
>> won't have to do a TLB flush and can simply install a fresh anon page.
> 
> That's true - although if that happens frequently, it's something we
> might want to tune the shrinker for anyway. If subpages do get used
> later, we probably shouldn't have split the THP to begin with.
> 
> IMO the safest bet would be to use the zero page unconditionally.
> 

... I don't disagree. It also smells more like an optimization on top, 
if ever.

>>> 		return false;
>>>
>>> 	newpte = pte_mkspecial(pfn_pte(page_to_pfn(ZERO_PAGE(pvmw->address)),
>>> 					pvmw->vma->vm_page_prot));
>>>
>>> 	set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte);
>>
>> We're replacing a present page by another present page without doing a
>> TLB flush in between. I *think* this should be fine because the new
>> present page is R/O and cannot possibly be written to.
> 
> It's safe because it's replacing a migration entry. The TLB was
> flushed when that was installed, and since the migration pte is not
> marked present it couldn't have re-established a TLB entry.

Oh, right, we're dealing with a migration entry.

-- 
Cheers,

David / dhildenb