[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230904103632.1f88ad89@collabora.com>
Date: Mon, 4 Sep 2023 10:36:32 +0200
From: Boris Brezillon <boris.brezillon@...labora.com>
To: Dmitry Osipenko <dmitry.osipenko@...labora.com>
Cc: David Airlie <airlied@...il.com>,
Gerd Hoffmann <kraxel@...hat.com>,
Gurchetan Singh <gurchetansingh@...omium.org>,
Chia-I Wu <olvaffe@...il.com>, Daniel Vetter <daniel@...ll.ch>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>,
Christian König <christian.koenig@....com>,
Qiang Yu <yuq825@...il.com>,
Steven Price <steven.price@....com>,
Emma Anholt <emma@...olt.net>, Melissa Wen <mwen@...lia.com>,
Will Deacon <will@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Boqun Feng <boqun.feng@...il.com>,
Mark Rutland <mark.rutland@....com>,
dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
kernel@...labora.com, virtualization@...ts.linux-foundation.org,
intel-gfx@...ts.freedesktop.org
Subject: Re: [PATCH v15 17/23] drm/shmem-helper: Add and use
drm_gem_shmem_resv_assert_held() helper
On Sat, 2 Sep 2023 22:43:02 +0300
Dmitry Osipenko <dmitry.osipenko@...labora.com> wrote:
> On 8/29/23 10:29, Boris Brezillon wrote:
> > On Tue, 29 Aug 2023 05:34:23 +0300
> > Dmitry Osipenko <dmitry.osipenko@...labora.com> wrote:
> >
> >> On 8/28/23 13:12, Boris Brezillon wrote:
> >>> On Sun, 27 Aug 2023 20:54:43 +0300
> >>> Dmitry Osipenko <dmitry.osipenko@...labora.com> wrote:
> >>>
> >>>> In a preparation of adding drm-shmem memory shrinker, move all reservation
> >>>> locking lockdep checks to use new drm_gem_shmem_resv_assert_held() that
> >>>> will resolve spurious lockdep warning about wrong locking order vs
> >>>> fs_reclam code paths during freeing of shmem GEM, where lockdep isn't
> >>>> aware that it's impossible to have locking contention with the fs_reclam
> >>>> at this special time.
> >>>>
> >>>> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@...labora.com>
> >>>> ---
> >>>> drivers/gpu/drm/drm_gem_shmem_helper.c | 37 +++++++++++++++++---------
> >>>> 1 file changed, 25 insertions(+), 12 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
> >>>> index d96fee3d6166..ca5da976aafa 100644
> >>>> --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> >>>> +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> >>>> @@ -128,6 +128,23 @@ struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, size_t
> >>>> }
> >>>> EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
> >>>>
> >>>> +static void drm_gem_shmem_resv_assert_held(struct drm_gem_shmem_object *shmem)
> >>>> +{
> >>>> + /*
> >>>> + * Destroying the object is a special case.. drm_gem_shmem_free()
> >>>> + * calls many things that WARN_ON if the obj lock is not held. But
> >>>> + * acquiring the obj lock in drm_gem_shmem_free() can cause a locking
> >>>> + * order inversion between reservation_ww_class_mutex and fs_reclaim.
> >>>> + *
> >>>> + * This deadlock is not actually possible, because no one should
> >>>> + * be already holding the lock when drm_gem_shmem_free() is called.
> >>>> + * Unfortunately lockdep is not aware of this detail. So when the
> >>>> + * refcount drops to zero, we pretend it is already locked.
> >>>> + */
> >>>> + if (kref_read(&shmem->base.refcount))
> >>>> + drm_gem_shmem_resv_assert_held(shmem);
> >>>> +}
> >>>> +
> >>>> /**
> >>>> * drm_gem_shmem_free - Free resources associated with a shmem GEM object
> >>>> * @shmem: shmem GEM object to free
> >>>> @@ -142,8 +159,6 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem)
> >>>> if (obj->import_attach) {
> >>>> drm_prime_gem_destroy(obj, shmem->sgt);
> >>>> } else if (!shmem->imported_sgt) {
> >>>> - dma_resv_lock(shmem->base.resv, NULL);
> >>>> -
> >>>> drm_WARN_ON(obj->dev, kref_read(&shmem->vmap_use_count));
> >>>>
> >>>> if (shmem->sgt) {
> >>>> @@ -156,8 +171,6 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem)
> >>>> drm_gem_shmem_put_pages_locked(shmem);
> >>>
> >>> AFAICT, drm_gem_shmem_put_pages_locked() is the only function that's
> >>> called in the free path and would complain about resv-lock not being
> >>> held. I think I'd feel more comfortable if we were adding a
> >>> drm_gem_shmem_free_pages() function that did everything
> >>> drm_gem_shmem_put_pages_locked() does except for the lock_held() check
> >>> and the refcount dec, and have it called here (and in
> >>> drm_gem_shmem_put_pages_locked()). This way we can keep using
> >>> dma_resv_assert_held() instead of having our own variant.
> >>
> >> It's not only drm_gem_shmem_free_pages(), but any drm-shmem function
> >> that drivers may use in the GEM's freeing callback.
> >>
> >> For example, panfrost_gem_free_object() may unpin shmem BO and then do
> >> drm_gem_shmem_free().
> >
> > Is this really a valid use case? If the GEM refcount dropped to zero,
> > we should certainly not have pages_pin_count > 0 (thinking of vmap-ed
> > buffers that might disappear while kernel still has a pointer to the
> > CPU-mapped area). The only reason we have this
> > drm_gem_shmem_put_pages_locked() in drm_gem_shmem_free() is because of
> > this implicit ref hold by the sgt, and IMHO, we should be stricter and
> > check that pages_use_count == 1 when sgt != NULL and pages_use_count ==
> > 0 otherwise.
> >
> > I actually think it's a good thing to try and catch any attempt to call
> > functions trying lock the resv in a path they're not supposed to. At
> > least we can decide whether these actions are valid or not in this
> > context, and provide dedicated helpers for the free path if they are.
>
> To me it's a valid use-case. I was going to do it for the virtio-gpu
> driver for a specific BO type that should be permanently pinned in
> memory. So I made the BO pinned in the virto_gpu's bo_create() and
> unpinned it from the virtio-gpu's gem->free(), this is a perfectly valid
> case to me. Though, in the end I switched to another approach that
> doesn't require to do the pinning in the virtio-gpu driver.
Not saying driver-specific gem_create() methods can't own an
implicit ref on pages, but not checking that pages_{use,ref}_count <=
max_implicit_refs means you leave an opportunity for pages ref leaks to
go unnoticed. If your driver has a pin_on_creation flag, it should get a
ref in the creation path, and release this ref in the gem_free() path,
before calling drm_gem_shmem_free(), so the shmem layer can still make
sure there's at most one implicit ref left (the one taken by the sgt
creation logic) in drm_gem_shmem_free().
Powered by blists - more mailing lists