lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241219163602.GF24724@willie-the-truck>
Date: Thu, 19 Dec 2024 16:36:02 +0000
From: Will Deacon <will@...nel.org>
To: Mikołaj Lenczewski <miko.lenczewski@....com>
Cc: ryan.roberts@....com, catalin.marinas@....com, corbet@....net,
	maz@...nel.org, oliver.upton@...ux.dev, joey.gouly@....com,
	suzuki.poulose@....com, yuzenghui@...wei.com,
	linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org, kvmarm@...ts.linux.dev
Subject: Re: [RESEND RFC PATCH v1 4/5] arm64/mm: Delay tlbi in
 contpte_convert() under BBML2

On Wed, Dec 11, 2024 at 04:01:40PM +0000, Mikołaj Lenczewski wrote:
> When converting a region via contpte_convert() to use mTHP, we have two
> different goals. We have to mark each entry as contiguous, and we would
> like to smear the dirty and young (access) bits across all entries in
> the contiguous block. Currently, we do this by first accumulating the
> dirty and young bits in the block, using an atomic
> __ptep_get_and_clear() and the relevant pte_{dirty,young}() calls,
> performing a tlbi, and finally smearing the correct bits across the
> block using __set_ptes().
> 
> This approach works fine for BBM level 0, but with support for BBM level
> 2 we are allowed to reorder the tlbi to after setting the pagetable
> entries. This reordering means that other threads will not see an
> invalid pagetable entry, instead operating on stale data, until we have
> performed our smearing and issued the invalidation. Avoiding this
> invalid entry reduces faults in other threads, and thus improves
> performance marginally (more so when there are more threads).

Please provide the performance data.

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ