lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkp-Nj3vmAWqJw_GZZ6oMmH5Bwv5eObvF+a3VHWa6p=q8w@mail.gmail.com>
Date: Tue, 1 Jul 2025 08:40:09 -0700
From: Yang Shi <shy828301@...il.com>
To: siddhartha@...ip.in
Cc: Dev Jain <dev.jain@....com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, mgorman@...e.de, 
	Vlastimil Babka <vbabka@...e.cz>, Rik van Riel <riel@...riel.com>
Subject: Re: [PATCH] mm: limit THP alignment – performance gain observed in AI inference workloads

>
> 🤖 3. How does this impact AI workloads like Hugging Face Transformers?
> Tokenization and dynamic batching create non-deterministic memory
> allocation patterns:
>
> Models like BERT and T5 dynamically allocate intermediate buffers per
> token-length, batch size, and attention window.
>
> Hugging Face + ONNX Runtime uses multiple small-ish anonymous mmap()s,
> often 512KB–1.8MB.

If I remember correctly, Rik's patch should just force PMD alignment
when the allocation size is greater than PMD size. Such VMA
fragmentation should be caused by allocations greater than 2M but not
PMD aligned, so they create 2M PMD + a bunch of 4K PTEs. Less than 2M
allocations should be right next to each other and mergeable. Did I
miss something?

Thanks,
Yang


>
> These allocations come in bursts — but due to forced alignment, the
> kernel was placing them with artificial gaps, defeating THP eligibility
> entirely.
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ