[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <907d1de9-7a38-4b6a-ba1f-eafa1b4c2783@redhat.com>
Date: Tue, 6 Aug 2024 19:33:26 +0200
From: David Hildenbrand <david@...hat.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Usama Arif <usamaarif642@...il.com>, akpm@...ux-foundation.org,
linux-mm@...ck.org, riel@...riel.com, shakeel.butt@...ux.dev,
roman.gushchin@...ux.dev, yuzhao@...gle.com, baohua@...nel.org,
ryan.roberts@....com, rppt@...nel.org, willy@...radead.org,
cerasuolodomenico@...il.com, corbet@....net, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH 0/6] mm: split underutilized THPs
On 06.08.24 19:28, Johannes Weiner wrote:
> On Thu, Aug 01, 2024 at 08:36:32AM +0200, David Hildenbrand wrote:
>> I just added another printf to postcopy_ram_supported_by_host(), where
>> we temporarily do a UFFDIO_REGISTER on some test area.
>>
>> Sensing UFFD support # postcopy_ram_supported_by_host()
>> Sensing UFFD support
>> Writing received pages during precopy # ram_load_precopy()
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Disabling THP: MADV_NOHUGEPAGE # postcopy_ram_prepare_discard()
>> Discarding pages # loadvm_postcopy_ram_handle_discard()
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Registering UFFD # postcopy_ram_incoming_setup()
>>
>> We could think about using this "ever user uffd" to avoid the shared
>> zeropage in most processes.
>>
>> Of course, there might be other applications where that wouldn't work,
>> but I think this behavior (write to area before enabling uffd) might be
>> fairly QEMU specific already.
>
> It makes me a bit uneasy to hardcode this into the kernel. It's fairly
> specific to qemu/criu, and won't protect usecases that behave slightly
> differently.
>
> It would also give userfaultfd users that aren't susceptible to this
> particular scenario a different code path.
True, but let's be honest, it's not like there are a million userfaultfd
users out there :)
Anyhow ...
>
>> Avoiding the shared zeropage has the benefit that a later write fault
>> won't have to do a TLB flush and can simply install a fresh anon page.
>
> That's true - although if that happens frequently, it's something we
> might want to tune the shrinker for anyway. If subpages do get used
> later, we probably shouldn't have split the THP to begin with.
>
> IMO the safest bet would be to use the zero page unconditionally.
>
... I don't disagree. It also smells more like an optimization on top,
if ever.
>>> return false;
>>>
>>> newpte = pte_mkspecial(pfn_pte(page_to_pfn(ZERO_PAGE(pvmw->address)),
>>> pvmw->vma->vm_page_prot));
>>>
>>> set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte);
>>
>> We're replacing a present page by another present page without doing a
>> TLB flush in between. I *think* this should be fine because the new
>> present page is R/O and cannot possibly be written to.
>
> It's safe because it's replacing a migration entry. The TLB was
> flushed when that was installed, and since the migration pte is not
> marked present it couldn't have re-established a TLB entry.
Oh, right, we're dealing with a migration entry.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists