linux-kernel - Re: [RESEND RFC PATCH v1 4/5] arm64/mm: Delay tlbi in contpte

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241219163602.GF24724@willie-the-truck>
Date: Thu, 19 Dec 2024 16:36:02 +0000
From: Will Deacon <will@...nel.org>
To: Mikołaj Lenczewski <miko.lenczewski@....com>
Cc: ryan.roberts@....com, catalin.marinas@....com, corbet@....net,
	maz@...nel.org, oliver.upton@...ux.dev, joey.gouly@....com,
	suzuki.poulose@....com, yuzenghui@...wei.com,
	linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org, kvmarm@...ts.linux.dev
Subject: Re: [RESEND RFC PATCH v1 4/5] arm64/mm: Delay tlbi in
 contpte_convert() under BBML2

On Wed, Dec 11, 2024 at 04:01:40PM +0000, Mikołaj Lenczewski wrote:
> When converting a region via contpte_convert() to use mTHP, we have two
> different goals. We have to mark each entry as contiguous, and we would
> like to smear the dirty and young (access) bits across all entries in
> the contiguous block. Currently, we do this by first accumulating the
> dirty and young bits in the block, using an atomic
> __ptep_get_and_clear() and the relevant pte_{dirty,young}() calls,
> performing a tlbi, and finally smearing the correct bits across the
> block using __set_ptes().
> 
> This approach works fine for BBM level 0, but with support for BBM level
> 2 we are allowed to reorder the tlbi to after setting the pagetable
> entries. This reordering means that other threads will not see an
> invalid pagetable entry, instead operating on stale data, until we have
> performed our smearing and issued the invalidation. Avoiding this
> invalid entry reduces faults in other threads, and thus improves
> performance marginally (more so when there are more threads).

Please provide the performance data.

Will