[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ef64fd1-f605-4ddf-82e6-74b5e2c43892@intel.com>
Date: Wed, 31 Jan 2024 10:20:25 +0800
From: Yin Fengwei <fengwei.yin@...el.com>
To: David Hildenbrand <david@...hat.com>, <linux-kernel@...r.kernel.org>
CC: <linux-mm@...ck.org>, Andrew Morton <akpm@...ux-foundation.org>, "Matthew
Wilcox" <willy@...radead.org>, Ryan Roberts <ryan.roberts@....com>, "Catalin
Marinas" <catalin.marinas@....com>, Will Deacon <will@...nel.org>, "Aneesh
Kumar K.V" <aneesh.kumar@...ux.ibm.com>, Nick Piggin <npiggin@...il.com>,
Peter Zijlstra <peterz@...radead.org>, Michael Ellerman <mpe@...erman.id.au>,
Christophe Leroy <christophe.leroy@...roup.eu>, "Naveen N. Rao"
<naveen.n.rao@...ux.ibm.com>, Heiko Carstens <hca@...ux.ibm.com>, "Vasily
Gorbik" <gor@...ux.ibm.com>, Alexander Gordeev <agordeev@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>, Sven Schnelle
<svens@...ux.ibm.com>, Arnd Bergmann <arnd@...db.de>,
<linux-arch@...r.kernel.org>, <linuxppc-dev@...ts.ozlabs.org>,
<linux-s390@...r.kernel.org>, "Huang, Ying" <ying.huang@...el.com>
Subject: Re: [PATCH v1 0/9] mm/memory: optimize unmap/zap with PTE-mapped THP
On 1/29/24 22:32, David Hildenbrand wrote:
> This series is based on [1] and must be applied on top of it.
> Similar to what we did with fork(), let's implement PTE batching
> during unmap/zap when processing PTE-mapped THPs.
>
> We collect consecutive PTEs that map consecutive pages of the same large
> folio, making sure that the other PTE bits are compatible, and (a) adjust
> the refcount only once per batch, (b) call rmap handling functions only
> once per batch, (c) perform batch PTE setting/updates and (d) perform TLB
> entry removal once per batch.
>
> Ryan was previously working on this in the context of cont-pte for
> arm64, int latest iteration [2] with a focus on arm6 with cont-pte only.
> This series implements the optimization for all architectures, independent
> of such PTE bits, teaches MMU gather/TLB code to be fully aware of such
> large-folio-pages batches as well, and amkes use of our new rmap batching
> function when removing the rmap.
>
> To achieve that, we have to enlighten MMU gather / page freeing code
> (i.e., everything that consumes encoded_page) to process unmapping
> of consecutive pages that all belong to the same large folio. I'm being
> very careful to not degrade order-0 performance, and it looks like I
> managed to achieve that.
One possible scenario:
If all the folio is 2M size folio, then one full batch could hold 510M memory.
Is it too much regarding one full batch before just can hold (2M - 4096 * 2)
memory?
Regards
Yin, Fengwei
Powered by blists - more mailing lists