[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40112a27-eddb-4c1a-a859-a34e202e6564@arm.com>
Date: Tue, 23 Jan 2024 12:28:25 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Matthew Wilcox <willy@...radead.org>, Russell King <linux@...linux.org.uk>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Dinh Nguyen <dinguyen@...nel.org>, Michael Ellerman <mpe@...erman.id.au>,
Nicholas Piggin <npiggin@...il.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
"Aneesh Kumar K.V" <aneesh.kumar@...nel.org>,
"Naveen N. Rao" <naveen.n.rao@...ux.ibm.com>,
Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt
<palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>, "David S. Miller"
<davem@...emloft.net>, linux-arm-kernel@...ts.infradead.org,
linuxppc-dev@...ts.ozlabs.org, linux-riscv@...ts.infradead.org,
linux-s390@...r.kernel.org, sparclinux@...r.kernel.org
Subject: Re: [PATCH v1 09/11] mm/memory: optimize fork() with PTE-mapped THP
On 23/01/2024 12:19, David Hildenbrand wrote:
> [...]
>
>>
>> I wrote some documentation for this (based on Matthew's docs for set_ptes() in
>> my version. Perhaps it makes sense to add it here, given this is overridable by
>> the arch.
>>
>> /**
>> * wrprotect_ptes - Write protect a consecutive set of pages.
>> * @mm: Address space that the pages are mapped into.
>> * @addr: Address of first page to write protect.
>> * @ptep: Page table pointer for the first entry.
>> * @nr: Number of pages to write protect.
>> *
>> * May be overridden by the architecture, else implemented as a loop over
>> * ptep_set_wrprotect().
>> *
>> * Context: The caller holds the page table lock. The PTEs are all in the same
>> * PMD.
>> */
>>
>
> I could have sworn I had a documentation at some point. Let me add some, thanks.
>
> [...]
>
>>> +
>>> + /*
>>> + * If we likely have to copy, just don't bother with batching. Make
>>> + * sure that the common "small folio" case stays as fast as possible
>>> + * by keeping the batching logic separate.
>>> + */
>>> + if (unlikely(!*prealloc && folio_test_large(folio) && max_nr != 1)) {
>>> + nr = folio_pte_batch(folio, addr, src_pte, pte, max_nr);
>>> + if (folio_test_anon(folio)) {
>>> + folio_ref_add(folio, nr);
>>> + if (unlikely(folio_try_dup_anon_rmap_ptes(folio, page,
>>> + nr, src_vma))) {
>>
>> What happens if its not the first page of the batch that fails here? Aren't you
>> signalling that you need a prealloc'ed page for the wrong pte? Shouldn't you
>> still batch copy all the way up to the failing page first? Perhaps it all comes
>> out in the wash and these events are so infrequent that we don't care about the
>> lost batching opportunity?
>
> I assume you mean the weird corner case that some folio pages in the range have
> PAE set, others don't -- and the folio maybe pinned.
>
> In that case, we fallback to individual pages, and might have preallocated a
> page although we wouldn't have to preallocate one for processing the next page
> (that doesn't have PAE set).
>
> It should all work, although not optimized to the extreme, and as it's a corner
> case, we don't particularly care. Hopefully, in the future we'll only have a
> single PAE flag per folio.
>
> Or am I missing something?
No, your explanation makes sense. Just wanted to check this all definitely
worked, because the flow is slightly different to my previous version that was
doing try_dup_rmap page-by-page.
>
>>
>>> + folio_ref_sub(folio, nr);
>>> + return -EAGAIN;
>>> + }
>>> + rss[MM_ANONPAGES] += nr;
>>> + VM_WARN_ON_FOLIO(PageAnonExclusive(page), folio);
>>> + } else {
>>> + folio_ref_add(folio, nr);
>>
>> Perhaps hoist this out to immediately after folio_pte_batch() since you're
>> calling it on both branches?
>
> Makes sense, thanks.
>
Powered by blists - more mailing lists