lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fe231f2d-fcb1-05c9-49c3-405c533a0200@suse.de>
Date: Mon, 28 Oct 2024 14:45:01 +0100 (CET)
From: Michael Matz <matz@...e.de>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
cc: Vlastimil Babka <vbabka@...e.cz>, 
    Andrew Morton <akpm@...ux-foundation.org>, 
    "Liam R. Howlett" <Liam.Howlett@...cle.com>, Jann Horn <jannh@...gle.com>, 
    Thorsten Leemhuis <regressions@...mhuis.info>, linux-mm@...ck.org, 
    linux-kernel@...r.kernel.org, Petr Tesarik <ptesarik@...e.com>, 
    Gabriel Krisman Bertazi <gabriel@...sman.be>, 
    Matthias Bodenbinder <matthias@...enbinder.de>, stable@...r.kernel.org, 
    Rik van Riel <riel@...riel.com>, Yang Shi <yang@...amperecomputing.com>
Subject: Re: [PATCH hotfix 6.12] mm, mmap: limit THP aligment of anonymous
 mappings to PMD-aligned sizes

Hello,

On Thu, 24 Oct 2024, Lorenzo Stoakes wrote:

> > benchmark seems to create many mappings of 4632kB, which would have
> > merged to a large THP-backed area before commit efa7df3e3bb5 and now
> > they are fragmented to multiple areas each aligned to PMD boundary with
> > gaps between. The regression then seems to be caused mainly due to the
> > benchmark's memory access pattern suffering from TLB or cache aliasing
> > due to the aligned boundaries of the individual areas.
> 
> Any more details on precisely why?

Anything we found out and theorized about is in the suse bugreport.  I 
think the best theory is TLB aliasing when the mixing^Whash function in 
the given hardware uses too few bits, and most of them in the low 21-12 
bits of an address.  Of course that then still depends on the particular 
access pattern.  cactuBSSN has about 20 memory streams in the hot loops, 
and the accesses are fairly regular from step to step (plus/minus certain 
strides in 3D arrays).  When their start addresses all differ only in the 
upper bits, you will hit TLB aliasing from time to time, and when the 
dimensions/strides are just right it occurs often, the N-way associativity 
doesn't save you anymore and you will hit it very very hard.

It was interesting to see how broad the range of CPUs and vendors was that 
exhibited the problem (in various degrees of severity, from 50% to 600% 
slowdown), and how more recent CPUs don't show the symptom anymore.  I 
guess the micro-arch guys eventually convinced P&R management that hashing 
another bit or two is worthwhile the silicon :-)


Ciao,
Michael.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ