[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkp-Nj3vmAWqJw_GZZ6oMmH5Bwv5eObvF+a3VHWa6p=q8w@mail.gmail.com>
Date: Tue, 1 Jul 2025 08:40:09 -0700
From: Yang Shi <shy828301@...il.com>
To: siddhartha@...ip.in
Cc: Dev Jain <dev.jain@....com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, mgorman@...e.de,
Vlastimil Babka <vbabka@...e.cz>, Rik van Riel <riel@...riel.com>
Subject: Re: [PATCH] mm: limit THP alignment – performance gain observed in AI inference workloads
>
> 🤖 3. How does this impact AI workloads like Hugging Face Transformers?
> Tokenization and dynamic batching create non-deterministic memory
> allocation patterns:
>
> Models like BERT and T5 dynamically allocate intermediate buffers per
> token-length, batch size, and attention window.
>
> Hugging Face + ONNX Runtime uses multiple small-ish anonymous mmap()s,
> often 512KB–1.8MB.
If I remember correctly, Rik's patch should just force PMD alignment
when the allocation size is greater than PMD size. Such VMA
fragmentation should be caused by allocations greater than 2M but not
PMD aligned, so they create 2M PMD + a bunch of 4K PTEs. Less than 2M
allocations should be right next to each other and mergeable. Did I
miss something?
Thanks,
Yang
>
> These allocations come in bursts — but due to forced alignment, the
> kernel was placing them with artificial gaps, defeating THP eligibility
> entirely.
>
Powered by blists - more mailing lists