linux-kernel - Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <424115a2-a924-4c28-8027-32db6ab9278d@arm.com>
Date: Wed, 31 Jan 2024 10:31:51 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
 Matthew Wilcox <willy@...radead.org>,
 Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
 Nick Piggin <npiggin@...il.com>, Peter Zijlstra <peterz@...radead.org>,
 Michael Ellerman <mpe@...erman.id.au>,
 Christophe Leroy <christophe.leroy@...roup.eu>,
 "Naveen N. Rao" <naveen.n.rao@...ux.ibm.com>,
 Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
 Alexander Gordeev <agordeev@...ux.ibm.com>,
 Christian Borntraeger <borntraeger@...ux.ibm.com>,
 Sven Schnelle <svens@...ux.ibm.com>, Arnd Bergmann <arnd@...db.de>,
 linux-arch@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
 linux-s390@...r.kernel.org
Subject: Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

On 31/01/2024 10:21, David Hildenbrand wrote:
> 
>>> +
>>> +#ifndef clear_full_ptes
>>> +/**
>>> + * clear_full_ptes - Clear PTEs that map consecutive pages of the same folio.
>>
>> I know its implied from "pages of the same folio" (and even more so for the
>> above variant due to mention of access/dirty), but I wonder if its useful to
>> explicitly state that "all ptes being cleared are present at the time of the
>> call"?
> 
> "Clear PTEs" -> "Clear present PTEs" ?
> 
> That should make it clearer.

Works for me.

> 
> [...]
> 
>>>       if (!delay_rmap) {
>>> -        folio_remove_rmap_pte(folio, page, vma);
>>> +        folio_remove_rmap_ptes(folio, page, nr, vma);
>>> +
>>> +        /* Only sanity-check the first page in a batch. */
>>>           if (unlikely(page_mapcount(page) < 0))
>>>               print_bad_pte(vma, addr, ptent, page);
>>
>> Is there a case for either removing this all together or moving it into
>> folio_remove_rmap_ptes()? It seems odd to only check some pages.
>>
> 
> I really wanted to avoid another nasty loop here.
> 
> In my thinking, for 4k folios, or when zapping subpages of large folios, we
> still perform the exact same checks. Only when batching we don't. So if there is
> some problem, there are ways to get it triggered. And these problems are barely
> ever seen.
> 
> folio_remove_rmap_ptes() feels like the better place -- especially because the
> delayed-rmap handling is effectively unchecked. But in there, we cannot
> "print_bad_pte()".
> 
> [background: if we had a total mapcount -- iow cheap folio_mapcount(), I'd check
> here that the total mapcount does not underflow, instead of checking per-subpage]

All good points... perhaps extend the comment to describe how this could be
solved in future with cheap total_mapcount()? Or in the commit log if you prefer?

> 
>>
>>>       }
>>> -    if (unlikely(__tlb_remove_page(tlb, page, delay_rmap))) {
>>> +    if (unlikely(__tlb_remove_folio_pages(tlb, page, nr, delay_rmap))) {
>>>           *force_flush = true;
>>>           *force_break = true;
>>>       }
>>>   }
>>>   -static inline void zap_present_pte(struct mmu_gather *tlb,
>>> +/*
>>> + * Zap or skip one present PTE, trying to batch-process subsequent PTEs that
>>> map
>>
>> Zap or skip *at least* one... ?
> 
> Ack
>