[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250224125124.34dc1e7a@collabora.com>
Date: Mon, 24 Feb 2025 12:51:24 +0100
From: Boris Brezillon <boris.brezillon@...labora.com>
To: Adrián Larumbe <adrian.larumbe@...labora.com>
Cc: Steven Price <steven.price@....com>, Liviu Dudau <liviu.dudau@....com>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>, Maxime Ripard
<mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>, David Airlie
<airlied@...il.com>, Simona Vetter <simona@...ll.ch>, Mihail Atanassov
<mihail.atanassov@....com>, kernel@...labora.com,
dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/2] drm/panthor: Avoid sleep locking in the internal
BO size path
Hi Adrian,
On Thu, 20 Feb 2025 20:26:23 +0000
Adrián Larumbe <adrian.larumbe@...labora.com> wrote:
> Hi Boris,
>
> On 15.02.2025 10:44, Boris Brezillon wrote:
> > On Fri, 14 Feb 2025 20:55:21 +0000
> > Adrián Larumbe <adrian.larumbe@...labora.com> wrote:
> >
> > > Commit 434e5ca5b5d7 ("drm/panthor: Expose size of driver internal BO's over
> > > fdinfo") locks the VMS xarray, to avoid UAF errors when the same VM is
> > > being concurrently destroyed by another thread. However, that puts the
> > > current thread in atomic context, which means taking the VMS' heap locks
> > > will trigger a warning as the thread is no longer allowed to sleep.
> > >
> > > Because in this case replacing the heap mutex with a spinlock isn't
> > > feasible, the fdinfo handler no longer traverses the list of heaps for
> > > every single VM associated with an open DRM file. Instead, when a new heap
> > > chunk is allocated, its size is accumulated into a VM-wide tally, which
> > > also makes the atomic context code path somewhat faster.
> > >
> > > Signed-off-by: Adrián Larumbe <adrian.larumbe@...labora.com>
> > > Fixes: 3e2c8c718567 ("drm/panthor: Expose size of driver internal BO's over fdinfo")
> > > ---
> > > drivers/gpu/drm/panthor/panthor_heap.c | 38 ++++++++------------------
> > > drivers/gpu/drm/panthor/panthor_heap.h | 2 --
> > > drivers/gpu/drm/panthor/panthor_mmu.c | 23 +++++++++++-----
> > > drivers/gpu/drm/panthor/panthor_mmu.h | 1 +
> > > 4 files changed, 28 insertions(+), 36 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/panthor/panthor_heap.c b/drivers/gpu/drm/panthor/panthor_heap.c
> > > index db0285ce5812..e5e5953e4f87 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_heap.c
> > > +++ b/drivers/gpu/drm/panthor/panthor_heap.c
> > > @@ -127,6 +127,8 @@ static void panthor_free_heap_chunk(struct panthor_vm *vm,
> > > heap->chunk_count--;
> > > mutex_unlock(&heap->lock);
> > >
> > > + panthor_vm_heaps_size_accumulate(vm, -heap->chunk_size);
> > > +
> > > panthor_kernel_bo_destroy(chunk->bo);
> > > kfree(chunk);
> > > }
> > > @@ -180,6 +182,8 @@ static int panthor_alloc_heap_chunk(struct panthor_device *ptdev,
> > > heap->chunk_count++;
> > > mutex_unlock(&heap->lock);
> > >
> > > + panthor_vm_heaps_size_accumulate(vm, heap->chunk_size);
> > > +
> > > return 0;
> > >
> > > err_destroy_bo:
> > > @@ -389,6 +393,7 @@ int panthor_heap_return_chunk(struct panthor_heap_pool *pool,
> > > removed = chunk;
> > > list_del(&chunk->node);
> > > heap->chunk_count--;
> > > + panthor_vm_heaps_size_accumulate(chunk->bo->vm, -heap->chunk_size);
> > > break;
> > > }
> > > }
> > > @@ -560,6 +565,8 @@ panthor_heap_pool_create(struct panthor_device *ptdev, struct panthor_vm *vm)
> > > if (ret)
> > > goto err_destroy_pool;
> > >
> > > + panthor_vm_heaps_size_accumulate(vm, pool->gpu_contexts->obj->size);
> > > +
> > > return pool;
> > >
> > > err_destroy_pool:
> > > @@ -594,8 +601,11 @@ void panthor_heap_pool_destroy(struct panthor_heap_pool *pool)
> > > xa_for_each(&pool->xa, i, heap)
> > > drm_WARN_ON(&pool->ptdev->base, panthor_heap_destroy_locked(pool, i));
> > >
> > > - if (!IS_ERR_OR_NULL(pool->gpu_contexts))
> > > + if (!IS_ERR_OR_NULL(pool->gpu_contexts)) {
> > > + panthor_vm_heaps_size_accumulate(pool->gpu_contexts->vm,
> > > + -pool->gpu_contexts->obj->size);
> > > panthor_kernel_bo_destroy(pool->gpu_contexts);
> > > + }
> > >
> > > /* Reflects the fact the pool has been destroyed. */
> > > pool->vm = NULL;
> > > @@ -603,29 +613,3 @@ void panthor_heap_pool_destroy(struct panthor_heap_pool *pool)
> > >
> > > panthor_heap_pool_put(pool);
> > > }
> > > -
> > > -/**
> > > - * panthor_heap_pool_size() - Calculate size of all chunks across all heaps in a pool
> > > - * @pool: Pool whose total chunk size to calculate.
> > > - *
> > > - * This function adds the size of all heap chunks across all heaps in the
> > > - * argument pool. It also adds the size of the gpu contexts kernel bo.
> > > - * It is meant to be used by fdinfo for displaying the size of internal
> > > - * driver BO's that aren't exposed to userspace through a GEM handle.
> > > - *
> > > - */
> > > -size_t panthor_heap_pool_size(struct panthor_heap_pool *pool)
> > > -{
> > > - struct panthor_heap *heap;
> > > - unsigned long i;
> > > - size_t size = 0;
> > > -
> > > - down_read(&pool->lock);
> > > - xa_for_each(&pool->xa, i, heap)
> > > - size += heap->chunk_size * heap->chunk_count;
> > > - up_read(&pool->lock);
> > > -
> > > - size += pool->gpu_contexts->obj->size;
> > > -
> > > - return size;
> > > -}
> > > diff --git a/drivers/gpu/drm/panthor/panthor_heap.h b/drivers/gpu/drm/panthor/panthor_heap.h
> > > index e3358d4e8edb..25a5f2bba445 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_heap.h
> > > +++ b/drivers/gpu/drm/panthor/panthor_heap.h
> > > @@ -27,8 +27,6 @@ struct panthor_heap_pool *
> > > panthor_heap_pool_get(struct panthor_heap_pool *pool);
> > > void panthor_heap_pool_put(struct panthor_heap_pool *pool);
> > >
> > > -size_t panthor_heap_pool_size(struct panthor_heap_pool *pool);
> > > -
> > > int panthor_heap_grow(struct panthor_heap_pool *pool,
> > > u64 heap_gpu_va,
> > > u32 renderpasses_in_flight,
> > > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> > > index 8c6fc587ddc3..9e48b34fcf80 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> > > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> > > @@ -347,6 +347,14 @@ struct panthor_vm {
> > > struct mutex lock;
> > > } heaps;
> > >
> > > + /**
> > > + * @fdinfo: VM-wide fdinfo fields.
> > > + */
> > > + struct {
> > > + /** @fdinfo.heaps_size: Size of all chunks across all heaps in the pool. */
> > > + atomic_t heaps_size;
> > > + } fdinfo;
> >
> > Feels more like a panthor_heap_pool field to me. If you do that,
> > you can keep the panthor_heap_pool_size() helper.
>
> The only downside of storing a per-heap-pool fdinfo size for its chunks size total is that we'll
> have to traverse all the heap pools owned by a VM any time the fdinfo handler for an open
> DRM file is invoked. That means spending a longer time with the vms xarray lock taken.
There's only one heap pool per VM though, and once the pool is created
it can't go away, so you don't even have to take the lock to deref the
panthor_vm::heaps::pool object, you just need a NULL check.
> > >
> > > +void panthor_vm_heaps_size_accumulate(struct panthor_vm *vm, ssize_t acc)
> > > +{
> > > + atomic_add(acc, &vm->fdinfo.heaps_size);
> > > +}
> >
> > Calling atomic_add() directly would probably be shorter, and I prefer
> > the idea of calling atomic_sub(size) instead of atomic_add(-size), so
> > how about we drop this helper and use atomic_add/sub() directly?
>
> I had to add this VM interface function because the VM struct fields are kept hidden from
> other compilation units, as struct panthor_vm is defined inside panthor_mmu.c. I agree
> using atomic_sub() would be clearer, but that would imply exporting yet another panthor_mmu
> symbol, and atomic_add() can take signed values anyway.
If you move the "heaps_size" field to panthor_heap_pool (which I would
rename "size" when moving it to panthor_heap::fdinfo BTW), you no longer
have this problem, because all users of this field exist in
panthor_heap.c only.
Regards,
Boris
Powered by blists - more mailing lists