lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <31a0661e-fa69-419c-9936-98bfe168d5a7@redhat.com>
Date: Tue, 23 Jan 2024 13:19:35 +0100
From: David Hildenbrand <david@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>, linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
 Matthew Wilcox <willy@...radead.org>, Russell King <linux@...linux.org.uk>,
 Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 Dinh Nguyen <dinguyen@...nel.org>, Michael Ellerman <mpe@...erman.id.au>,
 Nicholas Piggin <npiggin@...il.com>,
 Christophe Leroy <christophe.leroy@...roup.eu>,
 "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>,
 "Naveen N. Rao" <naveen.n.rao@...ux.ibm.com>,
 Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt
 <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
 Alexander Gordeev <agordeev@...ux.ibm.com>,
 Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
 Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
 Christian Borntraeger <borntraeger@...ux.ibm.com>,
 Sven Schnelle <svens@...ux.ibm.com>, "David S. Miller"
 <davem@...emloft.net>, linux-arm-kernel@...ts.infradead.org,
 linuxppc-dev@...ts.ozlabs.org, linux-riscv@...ts.infradead.org,
 linux-s390@...r.kernel.org, sparclinux@...r.kernel.org
Subject: Re: [PATCH v1 09/11] mm/memory: optimize fork() with PTE-mapped THP

[...]

> 
> I wrote some documentation for this (based on Matthew's docs for set_ptes() in
> my version. Perhaps it makes sense to add it here, given this is overridable by
> the arch.
> 
> /**
>   * wrprotect_ptes - Write protect a consecutive set of pages.
>   * @mm: Address space that the pages are mapped into.
>   * @addr: Address of first page to write protect.
>   * @ptep: Page table pointer for the first entry.
>   * @nr: Number of pages to write protect.
>   *
>   * May be overridden by the architecture, else implemented as a loop over
>   * ptep_set_wrprotect().
>   *
>   * Context: The caller holds the page table lock. The PTEs are all in the same
>   * PMD.
>   */
> 

I could have sworn I had a documentation at some point. Let me add some, 
thanks.

[...]

>> +
>> +	/*
>> +	 * If we likely have to copy, just don't bother with batching. Make
>> +	 * sure that the common "small folio" case stays as fast as possible
>> +	 * by keeping the batching logic separate.
>> +	 */
>> +	if (unlikely(!*prealloc && folio_test_large(folio) && max_nr != 1)) {
>> +		nr = folio_pte_batch(folio, addr, src_pte, pte, max_nr);
>> +		if (folio_test_anon(folio)) {
>> +			folio_ref_add(folio, nr);
>> +			if (unlikely(folio_try_dup_anon_rmap_ptes(folio, page,
>> +								  nr, src_vma))) {
> 
> What happens if its not the first page of the batch that fails here? Aren't you
> signalling that you need a prealloc'ed page for the wrong pte? Shouldn't you
> still batch copy all the way up to the failing page first? Perhaps it all comes
> out in the wash and these events are so infrequent that we don't care about the
> lost batching opportunity?

I assume you mean the weird corner case that some folio pages in the 
range have PAE set, others don't -- and the folio maybe pinned.

In that case, we fallback to individual pages, and might have 
preallocated a page although we wouldn't have to preallocate one for 
processing the next page (that doesn't have PAE set).

It should all work, although not optimized to the extreme, and as it's a 
corner case, we don't particularly care. Hopefully, in the future we'll 
only have a single PAE flag per folio.

Or am I missing something?

> 
>> +				folio_ref_sub(folio, nr);
>> +				return -EAGAIN;
>> +			}
>> +			rss[MM_ANONPAGES] += nr;
>> +			VM_WARN_ON_FOLIO(PageAnonExclusive(page), folio);
>> +		} else {
>> +			folio_ref_add(folio, nr);
> 
> Perhaps hoist this out to immediately after folio_pte_batch() since you're
> calling it on both branches?

Makes sense, thanks.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ