[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241219163602.GF24724@willie-the-truck>
Date: Thu, 19 Dec 2024 16:36:02 +0000
From: Will Deacon <will@...nel.org>
To: Mikołaj Lenczewski <miko.lenczewski@....com>
Cc: ryan.roberts@....com, catalin.marinas@....com, corbet@....net,
maz@...nel.org, oliver.upton@...ux.dev, joey.gouly@....com,
suzuki.poulose@....com, yuzenghui@...wei.com,
linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, kvmarm@...ts.linux.dev
Subject: Re: [RESEND RFC PATCH v1 4/5] arm64/mm: Delay tlbi in
contpte_convert() under BBML2
On Wed, Dec 11, 2024 at 04:01:40PM +0000, Mikołaj Lenczewski wrote:
> When converting a region via contpte_convert() to use mTHP, we have two
> different goals. We have to mark each entry as contiguous, and we would
> like to smear the dirty and young (access) bits across all entries in
> the contiguous block. Currently, we do this by first accumulating the
> dirty and young bits in the block, using an atomic
> __ptep_get_and_clear() and the relevant pte_{dirty,young}() calls,
> performing a tlbi, and finally smearing the correct bits across the
> block using __set_ptes().
>
> This approach works fine for BBM level 0, but with support for BBM level
> 2 we are allowed to reorder the tlbi to after setting the pagetable
> entries. This reordering means that other threads will not see an
> invalid pagetable entry, instead operating on stale data, until we have
> performed our smearing and issued the invalidation. Avoiding this
> invalid entry reduces faults in other threads, and thus improves
> performance marginally (more so when there are more threads).
Please provide the performance data.
Will
Powered by blists - more mailing lists