lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF6AEGsOTNedZhuBzipSQgNpG0SyVObaeq+g5U1hGUFfRYjw8w@mail.gmail.com>
Date: Thu, 15 May 2025 14:57:46 -0700
From: Rob Clark <robdclark@...il.com>
To: Danilo Krummrich <dakr@...nel.org>
Cc: dri-devel@...ts.freedesktop.org, freedreno@...ts.freedesktop.org, 
	linux-arm-msm@...r.kernel.org, Connor Abbott <cwabbott0@...il.com>, 
	Rob Clark <robdclark@...omium.org>, 
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>, 
	Thomas Zimmermann <tzimmermann@...e.de>, David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>, 
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 01/40] drm/gpuvm: Don't require obj lock in destructor path

On Thu, May 15, 2025 at 10:55 AM Danilo Krummrich <dakr@...nel.org> wrote:
>
> On Thu, May 15, 2025 at 10:35:21AM -0700, Rob Clark wrote:
> > On Thu, May 15, 2025 at 2:06 AM Danilo Krummrich <dakr@...nel.org> wrote:
> > >
> > > On Thu, May 15, 2025 at 10:54:27AM +0200, Danilo Krummrich wrote:
> > > > Hi Rob,
> > > >
> > > > Can you please CC me on patches for GPUVM?
> > > >
> > > > On Wed, May 14, 2025 at 10:53:15AM -0700, Rob Clark wrote:
> > > > > From: Rob Clark <robdclark@...omium.org>
> > > > >
> > > > > See commit a414fe3a2129 ("drm/msm/gem: Drop obj lock in
> > > > > msm_gem_free_object()") for justification.
> > > >
> > > > Please write a proper commit message that explains the problem and the solution.
> > > > Please don't just refer to another commit and leave it to the reviewer of the
> > > > patch to figure this out.
> > > >
> > > > > Signed-off-by: Rob Clark <robdclark@...omium.org>
> > > > > ---
> > > > >  drivers/gpu/drm/drm_gpuvm.c | 7 +++++--
> > > > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > >
> > > >
> > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> > > > > index f9eb56f24bef..1e89a98caad4 100644
> > > > > --- a/drivers/gpu/drm/drm_gpuvm.c
> > > > > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > > > > @@ -1511,7 +1511,9 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> > > > >     drm_gpuvm_bo_list_del(vm_bo, extobj, lock);
> > > > >     drm_gpuvm_bo_list_del(vm_bo, evict, lock);
> > > > >
> > > > > -   drm_gem_gpuva_assert_lock_held(obj);
> > > > > +   if (kref_read(&obj->refcount) > 0)
> > > > > +           drm_gem_gpuva_assert_lock_held(obj);
> > > > > +
> > > > >     list_del(&vm_bo->list.entry.gem);
> > > >
> > > > This seems wrong.
> > > >
> > > > A VM_BO object keeps a reference of the underlying GEM object, so this should
> > > > never happen.
> > > >
> > > > This function calls drm_gem_object_put() before it returns.
> > >
> > > I noticed your subsequent patch that allows VM_BO structures to have weak
> > > references to GEM objects.
> > >
> > > However, even with that this seems wrong. If the reference count of the GEM
> > > object is zero when drm_gpuvm_bo_destroy() is called it means that the GEM
> > > object is dead. However, until drm_gpuvm_bo_destroy() is called the GEM object
> > > potentially remains to be on the extobj and eviced list, which means that other
> > > code paths might fetch it from those lists and consider it to be a valid GEM
> > > object.
> >
> > We only iterate extobj or evicted in VM_BIND mode, where we aren't
> > using WEAK_REF.  I suppose some WARN_ON()s or BUG_ON()s could make
> > this more clear.
>
> There is also the GEM object's list of VM_BOs, are you using that?

yes, but at this point there are no more ref's to the obj, and that
list is obj specific

> Anyways, I don't agree with that. Even if you can tweak your driver to not run
> into trouble with this, we can't introduce a mode that violates GOUVM's internal
> lifetimes and subsequently fix it up with WARN_ON() or BUG_ON().
>
> I still don't see a real technical reason why msm can't be reworked to follow
> those lifetime rules.

The basic issue is that (a) it would be really awkward to have two
side-by-side VM/VMA management/tracking systems.  But in legacy mode,
we have the opposite direction of reference holding.  (But at the same
time, don't need/use most of the features of gpuvm.)

BR,
-R

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ