lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c6a3aa0-c1eb-4726-988a-460c4895f615@amd.com>
Date: Fri, 19 Sep 2025 13:39:43 +0200
From: Christian König <christian.koenig@....com>
To: Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@...lia.com>,
 Michel Dänzer <michel.daenzer@...lbox.org>,
 Huang Rui <ray.huang@....com>, Matthew Auld <matthew.auld@...el.com>,
 Matthew Brost <matthew.brost@...el.com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
 linux-kernel@...r.kernel.org, kernel-dev@...lia.com,
 Sergey Senozhatsky <senozhatsky@...omium.org>
Subject: Re: [PATCH RFC v2 0/3] drm/ttm: allow direct reclaim to be skipped

On 19.09.25 13:13, Thadeu Lima de Souza Cascardo wrote:
>>>
>>>> The alternative I can offer is to disable the fallback which in your case would trigger the OOM killer.
>>>>
> 
> Warning could be as simple as removing __GFP_NOWARN. But I don't think we
> want either a warning or to trigger the OOM killer when allocating lower
> order pages are still possible. That will already happen when we get to 0
> order pages, where there is no fallback available anymore, and, then, it
> makes sense to try harder and warn if no page can be allocated.

I don't think you understand the problem.

Allocating lower order pages is not really an alternative. You run into really a lot of technical issues with that.

The reason we have it is to prevent crashes in OOM situations. In other words still allow displaying warning messages for example.

> Under my current workload, the balance skews torwards 0-order pages,
> reducing the amount of 10 and 9 order pages to half, when comparing runs
> with direct reclaim and without direct reclaim.

That pretty much completely disqualifies this approach.

This is a clear indicator that your system simply doesn't have enough memory for the workload you are trying to run.

> So, I understand your
> concern in respect to the impact on the GPU TLB and potential flickering.
> Is there a way we can measure it on the devices we are using? And, then, if
> it does not show to be a problem on those devices, would making this be a
> setting per-device be acceptable to you? In a way that we could have in
> userspace a list of devices where it is okay to prefer not to reclaim over
> getting huge pages and that could be set if the workload prefers lower
> latency in those allocations?

No, you are clearly trying to run a use case which as far as I can see we can't really support without running into a lot of trouble sooner or later.

Regards,
Christian.

> 
> Thanks.
> Cascardo.
> 
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> Other drivers can later opt to use this mechanism too.
>>>>>
>>>>> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
>>>>> ---
>>>>> Changes in v2:
>>>>> - Make disabling direct reclaim an option.
>>>>> - Link to v1: https://lore.kernel.org/r/20250910-ttm_pool_no_direct_reclaim-v1-1-53b0fa7f80fa@igalia.com
>>>>>
>>>>> ---
>>>>> Thadeu Lima de Souza Cascardo (3):
>>>>>        ttm: pool: allow requests to prefer latency over throughput
>>>>>        ttm: pool: add a module parameter to set latency preference
>>>>>        drm/amdgpu: allow allocation preferences when creating GEM object
>>>>>
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    |  3 ++-
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  3 ++-
>>>>>   drivers/gpu/drm/ttm/ttm_pool.c             | 23 +++++++++++++++++------
>>>>>   drivers/gpu/drm/ttm/ttm_tt.c               |  2 +-
>>>>>   include/drm/ttm/ttm_bo.h                   |  5 +++++
>>>>>   include/drm/ttm/ttm_pool.h                 |  2 +-
>>>>>   include/drm/ttm/ttm_tt.h                   |  2 +-
>>>>>   include/uapi/drm/amdgpu_drm.h              |  9 +++++++++
>>>>>   8 files changed, 38 insertions(+), 11 deletions(-)
>>>>> ---
>>>>> base-commit: f83ec76bf285bea5727f478a68b894f5543ca76e
>>>>> change-id: 20250909-ttm_pool_no_direct_reclaim-ee0807a2d3fe
>>>>>
>>>>> Best regards,
>>>>
>>>
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ