[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f551ee9059e52d46343f5fa997b7d9f8ab6654d9.camel@linux.intel.com>
Date: Tue, 03 Oct 2023 14:25:56 +0200
From: Thomas Hellström
<thomas.hellstrom@...ux.intel.com>
To: Boris Brezillon <boris.brezillon@...labora.com>
Cc: Danilo Krummrich <dakr@...hat.com>, airlied@...il.com,
daniel@...ll.ch, matthew.brost@...el.com, sarah.walker@...tec.com,
donald.robson@...tec.com, christian.koenig@....com,
faith@...strand.net, dri-devel@...ts.freedesktop.org,
nouveau@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate
external/evicted objects
Hi, Boris,
On Tue, 2023-10-03 at 12:05 +0200, Boris Brezillon wrote:
> Hello Thomas,
>
> On Tue, 3 Oct 2023 10:36:10 +0200
> Thomas Hellström <thomas.hellstrom@...ux.intel.com> wrote:
>
> > > +/**
> > > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > > + * @__gpuvm: The GPU VM
> > > + * @__list_name: The name of the list we're iterating on
> > > + * @__local_list: A pointer to the local list used to store
> > > already iterated items
> > > + * @__prev_vm_bo: The previous element we got from
> > > drm_gpuvm_get_next_cached_vm_bo()
> > > + *
> > > + * This helper is here to provide lockless list iteration.
> > > Lockless as in, the
> > > + * iterator releases the lock immediately after picking the
> > > first element from
> > > + * the list, so list insertion deletion can happen concurrently.
> > > + *
> > > + * Elements popped from the original list are kept in a local
> > > list, so removal
> > > + * and is_empty checks can still happen while we're iterating
> > > the list.
> > > + */
> > > +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
> > > __local_list, __prev_vm_bo) \
> > > + ({
> > > \
> > > + struct drm_gpuvm_bo *__vm_bo =
> > > NULL; \
> > > +
> > > \
> > > + drm_gpuvm_bo_put(__prev_vm_bo);
> > > \
> > > +
> > > \
> > > + spin_lock(&(__gpuvm)-
> > > >__list_name.lock); \
> >
> > Here we unconditionally take the spinlocks while iterating, and the
> > main
> > point of DRM_GPUVM_RESV_PROTECTED was really to avoid that?
> >
> >
> > > + if (!(__gpuvm)-
> > > >__list_name.local_list) \
> > > + (__gpuvm)->__list_name.local_list =
> > > __local_list; \
> > > + else
> > > \
> > > + WARN_ON((__gpuvm)->__list_name.local_list
> > > != __local_list); \
> > > +
> > > \
> > > + while (!list_empty(&(__gpuvm)->__list_name.list))
> > > { \
> > > + __vm_bo = list_first_entry(&(__gpuvm)-
> > > >__list_name.list, \
> > > + struct
> > > drm_gpuvm_bo, \
> > > +
> > > list.entry.__list_name); \
> > > + if (kref_get_unless_zero(&__vm_bo->kref))
> > > {
> > And unnecessarily grab a reference in the RESV_PROTECTED case.
> > > \
> > > + list_move_tail(&(__vm_bo)-
> > > >list.entry.__list_name, \
> > > +
> > > __local_list); \
> > > + break;
> > > \
> > > + } else
> > > { \
> > > + list_del_init(&(__vm_bo)-
> > > >list.entry.__list_name); \
> > > + __vm_bo =
> > > NULL; \
> > > + }
> > > \
> > > + }
> > > \
> > > + spin_unlock(&(__gpuvm)-
> > > >__list_name.lock); \
> > > +
> > > \
> > > + __vm_bo;
> > > \
> > > + })
> >
> > IMHO this lockless list iteration looks very complex and should be
> > pretty difficult to maintain while moving forward, also since it
> > pulls
> > the gpuvm_bos off the list, list iteration needs to be protected by
> > an
> > outer lock anyway.
>
> As being partly responsible for this convoluted list iterator, I must
> say I agree with you. There's so many ways this can go wrong if the
> user doesn't call it the right way, or doesn't protect concurrent
> list
> iterations with a separate lock (luckily, this is a private
> iterator). I
> mean, it works, so there's certainly a way to get it right, but gosh,
> this is so far from the simple API I had hoped for.
>
> > Also from what I understand from Boris, the extobj
> > list would typically not need the fine-grained locking; only the
> > evict
> > list?
>
> Right, I'm adding the gpuvm_bo to extobj list in the ioctl path, when
> the GEM and VM resvs are held, and I'm deferring the
> drm_gpuvm_bo_put()
> call to a work that's not in the dma-signalling path. This being
> said,
> I'm still not comfortable with the
>
> gem = drm_gem_object_get(vm_bo->gem);
> dma_resv_lock(gem->resv);
> drm_gpuvm_bo_put(vm_bo);
> dma_resv_unlock(gem->resv);
> drm_gem_object_put(gem);
>
> dance that's needed to avoid a UAF when the gpuvm_bo is the last GEM
> owner, not to mention that drm_gpuva_unlink() calls
> drm_gpuvm_bo_put()
> after making sure the GEM gpuvm_list lock is held, but this lock
> might
> differ from the resv lock (custom locking so we can call
> gpuvm_unlink() in the dma-signalling path). So we now have paths
> where
> drm_gpuvm_bo_put() are called with the resv lock held, and others
> where
> they are not, and that only works because we're relying on the the
> fact
> those drm_gpuvm_bo_put() calls won't make the refcount drop to zero,
> because the deferred vm_bo_put() work still owns a vm_bo ref.
I'm not sure I follow to 100% here, but in the code snippet above it's
pretty clear to me that it needs to hold an explicit gem object
reference when calling dma_resv_unlock(gem->resv). Each time you copy a
referenced pointer (here from vm_bo->gem to gem) you need to up the
refcount unless you make sure (by locks or other means) that the source
of the copy has a strong refcount and stays alive, so that's no weird
action to me. Could possibly add a drm_gpuvm_bo_get_gem() to access the
gem member (and that also takes a refcount) for driver users to avoid
the potential pitfall.
>
> All these tiny details add to the overall complexity of this common
> layer, and to me, that's not any better than the
> get_next_vm_bo_from_list() complexity you were complaining about
> (might
> be even worth, because this sort of things leak to users).
>
> Having an internal lock partly solves that, in that the locking of
> the
> extobj list is now entirely orthogonal to the GEM that's being
> removed
> from this list, and we can lock/unlock internally without forcing the
> caller to take weird actions to make sure things don't explode. Don't
> get me wrong, I get that this locking overhead is not acceptable for
> Xe, but I feel like we're turning drm_gpuvm into a white elephant
> that
> only few people will get right.
I tend to agree, but to me the big complication comes from the async
(dma signalling path) state updates.
Let's say for example we have a lower level lock for the gem object's
gpuvm_bo list. Some drivers grab it from the dma fence signalling path,
other drivers need to access all vm's of a bo to grab their dma_resv
locks using a WW transaction. There will be problems, although probably
solveable.
>
> This is just my personal view on this, and I certainly don't want to
> block or delay the merging of this patchset, but I thought I'd share
> my
> concerns. As someone who's been following the evolution of this
> drm_gpuva/vm series for weeks, and who's still sometimes getting
> lost,
> I can't imagine how new drm_gpuvm users would feel...
>
> > Also it seems that if we are to maintain two modes here, for
> > reasonably clean code we'd need two separate instances of
> > get_next_bo_from_list().
> >
> > For the !RESV_PROTECTED case, perhaps one would want to consider
> > the
> > solution used currently in xe, where the VM maintains two evict
> > lists.
> > One protected by a spinlock and one protected by the VM resv. When
> > the
> > VM resv is locked to begin list traversal, the spinlock is locked
> > *once*
> > and the spinlock-protected list is looped over and copied into the
> > resv
> > protected one. For traversal, the resv protected one is used.
>
> Oh, so you do have the same sort of trick where you move the entire
> list to another list, such that you can let other paths update the
> list
> while you're iterating your own snapshot. That's interesting...
Yes, it's instead of the "evicted" bool suggested here. I thought the
latter would be simpler. Although that remains to be seen after all
use-cases are implemented.
But in general I think the concept of copying from a staging list to
another with different protection rather than traversing the first list
and unlocking between items is a good way of solving the locking
inversion problem with minimal overhead. We use it also for O(1)
userptr validation.
/Thomas
>
> Regards,
>
> Boris
Powered by blists - more mailing lists