lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <979f8223-68b0-4a75-b410-fd86cfe6c372@mailbox.org>
Date: Thu, 11 Sep 2025 16:48:19 +0200
From: Michel Dänzer <michel.daenzer@...lbox.org>
To: Christian König <christian.koenig@....com>,
 Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
Cc: Huang Rui <ray.huang@....com>, Matthew Auld <matthew.auld@...el.com>,
 Matthew Brost <matthew.brost@...el.com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
 kernel-dev@...lia.com, Sergey Senozhatsky <senozhatsky@...omium.org>
Subject: Re: [PATCH] drm: ttm: do not direct reclaim when allocating high
 order pages

On 11.09.25 16:31, Christian König wrote:
> On 11.09.25 14:49, Michel Dänzer wrote:
>>>>> What we are seeing here is on a low memory (4GiB) single node system with
>>>>> an APU, that it will have lots of latencies trying to allocate memory by
>>>>> doing direct reclaim trying to allocate order-10 pages, which will fail and
>>>>> down it goes until it gets to order-4 or order-3. With this change, we
>>>>> don't see those latencies anymore and memory pressure goes down as well.
>>>> That reminds me of the scenario I described in the 00862edba135 ("drm/ttm: Use GFP_TRANSHUGE_LIGHT for allocating huge pages") commit log, where taking a filesystem backup could cause Firefox to freeze for on the order of a minute.
>>>>
>>>> Something like that can't just be ignored as "not a problem" for a potential 30% performance gain.
>>>
>>> Well using 2MiB is actually a must have for certain HW features and we have quite a lot of people pushing to always using them.
>>
>> Latency can't just be ignored though. Interactive apps intermittently freezing because this code desperately tries to reclaim huge pages while the system is under memory pressure isn't acceptable.
> 
> Why should that not be acceptable?

Sounds like you didn't read / understand the scenario in the 00862edba135 commit log:

I was trying to use Firefox while restic was taking a filesystem backup, and it froze for up to a minute. After disabling direct reclaim, Firefox was perfectly usable without noticeable freezes in the same scenario.

Show me the user who finds it acceptable to wait for a minute for interactive apps to respond, just in case some GPU operations might be 30% faster.


> The purpose of the fallback is to allow displaying messages like "Your system is low on memory, please close some application!" instead of triggering the OOM killer directly.

That's not the issue here.


-- 
Earthling Michel Dänzer       \        GNOME / Xwayland / Mesa developer
https://redhat.com             \               Libre software enthusiast

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ