lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250918-ttm_pool_no_direct_reclaim-v2-1-135294e1f8a2@igalia.com>
Date: Thu, 18 Sep 2025 17:09:24 -0300
From: Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
To: Christian Koenig <christian.koenig@....com>, 
 Michel Dänzer <michel.daenzer@...lbox.org>, 
 Huang Rui <ray.huang@....com>, Matthew Auld <matthew.auld@...el.com>, 
 Matthew Brost <matthew.brost@...el.com>, 
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>, 
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>, 
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>
Cc: amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org, 
 linux-kernel@...r.kernel.org, kernel-dev@...lia.com, 
 Tvrtko Ursulin <tvrtko.ursulin@...lia.com>, 
 Sergey Senozhatsky <senozhatsky@...omium.org>, 
 Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
Subject: [PATCH RFC v2 1/3] ttm: pool: allow requests to prefer latency
 over throughput

The TTM pool allocator prefer to allocate higher order pages such that the
GPU will spend less time walking page tables and provide better throughput.

There were cases where too much fragmented memory led to a 30% change in
the throughput of a given GPU workload on a datacenter.

On a desktop workload on a low-memory system, though, allocating such
higher order pages might put the system under memory pressure, triggering
direct reclaim and leading to latency in certain desktop operations, while
allocating lower order pages would be possible and avoid such reclaims.

This was seen on ChromeOS when opening multiple tabs and switching
desktops, leading to high latency in such operations.

Add an option to the ttm operation context that allows the behavior to be
set system wide or per TTM object.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 11 +++++++----
 include/drm/ttm/ttm_bo.h       |  5 +++++
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index baf27c70a4193a121fbc8b4e67cd6feb4c612b85..02c622a103fcece003bd70ce6b5833ada70f5228 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -133,7 +133,8 @@ static DECLARE_RWSEM(pool_shrink_rwsem);
 
 /* Allocate pages of size 1 << order with the given gfp_flags */
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
-					unsigned int order)
+					unsigned int order,
+					const struct ttm_operation_ctx *ctx)
 {
 	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
 	struct ttm_pool_dma *dma;
@@ -144,9 +145,12 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 	 * Mapping pages directly into an userspace process and calling
 	 * put_page() on a TTM allocated page is illegal.
 	 */
-	if (order)
+	if (order) {
 		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
 			__GFP_THISNODE;
+		if (ctx->alloc_method == ttm_op_alloc_latency)
+			gfp_flags &= ~__GFP_DIRECT_RECLAIM;
+	}
 
 	if (!pool->use_dma_alloc) {
 		p = alloc_pages_node(pool->nid, gfp_flags, order);
@@ -745,7 +749,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 		if (!p) {
 			page_caching = ttm_cached;
 			allow_pools = false;
-			p = ttm_pool_alloc_page(pool, gfp_flags, order);
+			p = ttm_pool_alloc_page(pool, gfp_flags, order, ctx);
 		}
 		/* If that fails, lower the order if possible and retry. */
 		if (!p) {
@@ -815,7 +819,6 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 		return -EINVAL;
 
 	ttm_pool_alloc_state_init(tt, &alloc);
-
 	return __ttm_pool_alloc(pool, tt, ctx, &alloc, NULL);
 }
 EXPORT_SYMBOL(ttm_pool_alloc);
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 479b7ed075c0ffba21df971db7fef914c531a51d..8531f8e8bb9b079927d0e4759a12819303542f62 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -184,6 +184,11 @@ struct ttm_operation_ctx {
 	bool no_wait_gpu;
 	bool gfp_retry_mayfail;
 	bool allow_res_evict;
+	enum {
+		ttm_op_alloc_default = 0,
+		ttm_op_alloc_latency = 2,
+		ttm_op_alloc_throughput = 3,
+	} alloc_method;
 	struct dma_resv *resv;
 	uint64_t bytes_moved;
 };

-- 
2.47.3


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ