linux-kernel - Re: [PATCH] drm: ttm: do not direct reclaim when allocating high order pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b7c57dc3-ed0e-402f-8a3c-f832357f8763@amd.com>
Date: Thu, 11 Sep 2025 11:07:01 +0200
From: Christian König <christian.koenig@....com>
To: Michel Dänzer <michel.daenzer@...lbox.org>,
 Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
Cc: Huang Rui <ray.huang@....com>, Matthew Auld <matthew.auld@...el.com>,
 Matthew Brost <matthew.brost@...el.com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
 kernel-dev@...lia.com, Sergey Senozhatsky <senozhatsky@...omium.org>
Subject: Re: [PATCH] drm: ttm: do not direct reclaim when allocating high
 order pages

On 11.09.25 10:26, Michel Dänzer wrote:
> On 10.09.25 14:52, Thadeu Lima de Souza Cascardo wrote:
>> On Wed, Sep 10, 2025 at 02:11:58PM +0200, Christian König wrote:
>>> On 10.09.25 13:59, Thadeu Lima de Souza Cascardo wrote:
>>>> When the TTM pool tries to allocate new pages, it stats with max order. If
>>>> there are no pages ready in the system, the page allocator will start
>>>> reclaim. If direct reclaim fails, the allocator will reduce the order until
>>>> it gets all the pages it wants with whatever order the allocator succeeds
>>>> to reclaim.
>>>>
>>>> However, while the allocator is reclaiming, lower order pages might be
>>>> available, which would work just fine for the pool allocator. Doing direct
>>>> reclaim just introduces latency in allocating memory.
>>>>
>>>> The system should still start reclaiming in the background with kswapd, but
>>>> the pool allocator should try to allocate a lower order page instead of
>>>> directly reclaiming.
>>>>
>>>> If not even a order-1 page is available, the TTM pool allocator will
>>>> eventually get to start allocating order-0 pages, at which point it should
>>>> and will directly reclaim.
>>>
>>> Yeah that was discussed before quite a bit but at least for AMD GPUs that is absolutely not something we should do.
>>>
>>> The performance difference between using high and low order pages can be up to 30%. So the added extra latency is just vital for good performance.
>>>
>>> We could of course make that depend on the HW you use if it isn't necessary for some other GPU, but at least both NVidia and Intel seem to have pretty much the same HW restrictions.
>>>
>>> NVidia has been working on extending this to even use 1GiB pages to reduce the TLB overhead even further.
>>
>> But if the system cannot reclaim or is working hard on reclaiming, it will
>> not allocate that page and the pool allocator will resort to lower order
>> pages anyway.
>>
>> In case the system has pages available, it will use them. I think there is
>> a balance here and I find this one is reasonable. If the system is not
>> under pressure, it will allocate those higher order pages, as expected.
>>
>> I can look into the behavior when the system might be fragmented, but I
>> still believe that the pool is offering such a protection by keeping those
>> higher order pages around. It is when the system is under memory presure
>> that we need to resort to lower order pages.
>>
>> What we are seeing here is on a low memory (4GiB) single node system with
>> an APU, that it will have lots of latencies trying to allocate memory by
>> doing direct reclaim trying to allocate order-10 pages, which will fail and
>> down it goes until it gets to order-4 or order-3. With this change, we
>> don't see those latencies anymore and memory pressure goes down as well.
> That reminds me of the scenario I described in the 00862edba135 ("drm/ttm: Use GFP_TRANSHUGE_LIGHT for allocating huge pages") commit log, where taking a filesystem backup could cause Firefox to freeze for on the order of a minute.
> 
> Something like that can't just be ignored as "not a problem" for a potential 30% performance gain.

Well using 2MiB is actually a must have for certain HW features and we have quite a lot of people pushing to always using them.

So that TTM still falls back to lower order allocations is just a compromise to not trigger the OOM killer.

What we could do is to remove the fallback, but then Cascardos use case wouldn't be working any more at all.

Regards,
Christian.