linux-kernel - Re: [PATCH hotfix 6.12] mm, mmap: limit THP aligment of anonymous mappings to PMD-aligned sizes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <fe231f2d-fcb1-05c9-49c3-405c533a0200@suse.de>
Date: Mon, 28 Oct 2024 14:45:01 +0100 (CET)
From: Michael Matz <matz@...e.de>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
cc: Vlastimil Babka <vbabka@...e.cz>, 
    Andrew Morton <akpm@...ux-foundation.org>, 
    "Liam R. Howlett" <Liam.Howlett@...cle.com>, Jann Horn <jannh@...gle.com>, 
    Thorsten Leemhuis <regressions@...mhuis.info>, linux-mm@...ck.org, 
    linux-kernel@...r.kernel.org, Petr Tesarik <ptesarik@...e.com>, 
    Gabriel Krisman Bertazi <gabriel@...sman.be>, 
    Matthias Bodenbinder <matthias@...enbinder.de>, stable@...r.kernel.org, 
    Rik van Riel <riel@...riel.com>, Yang Shi <yang@...amperecomputing.com>
Subject: Re: [PATCH hotfix 6.12] mm, mmap: limit THP aligment of anonymous
 mappings to PMD-aligned sizes

Hello,

On Thu, 24 Oct 2024, Lorenzo Stoakes wrote:

> > benchmark seems to create many mappings of 4632kB, which would have
> > merged to a large THP-backed area before commit efa7df3e3bb5 and now
> > they are fragmented to multiple areas each aligned to PMD boundary with
> > gaps between. The regression then seems to be caused mainly due to the
> > benchmark's memory access pattern suffering from TLB or cache aliasing
> > due to the aligned boundaries of the individual areas.
> 
> Any more details on precisely why?

Anything we found out and theorized about is in the suse bugreport.  I 
think the best theory is TLB aliasing when the mixing^Whash function in 
the given hardware uses too few bits, and most of them in the low 21-12 
bits of an address.  Of course that then still depends on the particular 
access pattern.  cactuBSSN has about 20 memory streams in the hot loops, 
and the accesses are fairly regular from step to step (plus/minus certain 
strides in 3D arrays).  When their start addresses all differ only in the 
upper bits, you will hit TLB aliasing from time to time, and when the 
dimensions/strides are just right it occurs often, the N-way associativity 
doesn't save you anymore and you will hit it very very hard.

It was interesting to see how broad the range of CPUs and vendors was that 
exhibited the problem (in various degrees of severity, from 50% to 600% 
slowdown), and how more recent CPUs don't show the symptom anymore.  I 
guess the micro-arch guys eventually convinced P&R management that hashing 
another bit or two is worthwhile the silicon :-)

Ciao,
Michael.