linux-kernel - Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2c4bbd8c-ec2c-91ad-9f27-5476b7e65b4f@redhat.com>
Date:   Thu, 14 Sep 2023 19:15:55 +0200
From:   Danilo Krummrich <dakr@...hat.com>
To:     Thomas Hellström 
        <thomas.hellstrom@...ux.intel.com>
Cc:     matthew.brost@...el.com, sarah.walker@...tec.com,
        nouveau@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
        dri-devel@...ts.freedesktop.org, boris.brezillon@...labora.com,
        donald.robson@...tec.com, daniel@...ll.ch,
        christian.koenig@....com, faith.ekstrand@...labora.com
Subject: Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize
 dma_resv/extobj handling and GEM validation

On 9/14/23 19:13, Thomas Hellström wrote:
> On Thu, 2023-09-14 at 17:27 +0200, Danilo Krummrich wrote:
>> On 9/14/23 13:32, Thomas Hellström wrote:
>>>
>>> On 9/14/23 12:57, Danilo Krummrich wrote:
>>>> On 9/13/23 14:16, Danilo Krummrich wrote:
>>>>
>>>> <snip>
>>>>
>>>>>>> And validate() can remove it while still holding all dma-
>>>>>>> resv locks,
>>>>>>> neat!
>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>> concurrently? What
>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>> drm_gpuva_unlink()?
>>>>>>> Are we guaranteed that at this point of time the
>>>>>>> drm_gpuvm_bo is not
>>>>>>> on the
>>>>>>> evicted list? Because otherwise we would call
>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>> with the
>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>> might drop the last reference to the drm_gem_object and
>>>>>>> hence we'd
>>>>>>> potentially
>>>>>>> free the dma-resv lock while holding it, at least if it's
>>>>>>> an external
>>>>>>> object.
>>>>>>
>>>>>> Easiest way in this scheme is to think of the lists as being
>>>>>> protected
>>>>>> by the vm's resv lock. That means anybody calling unlink()
>>>>>> must also
>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of
>>>>>> view, but
>>>>>> perhaps not from a locking inversion POW from an async list
>>>>>> update).
>>>>>
>>>>> This would mean that on unlink() we'd need to hold the VM's
>>>>> resv lock and the
>>>>> corresponding GEM's resv lock (in case they're not the same
>>>>> anyways) because the
>>>>> VM's resv lock would protect the external / evicted object
>>>>> lists and the GEM
>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and
>>>>> the
>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>
>>>> As mentioned below the same applies for drm_gpuvm_bo_put() since
>>>> it might
>>>> destroy the vm_bo, which includes removing the vm_bo from
>>>> external / evicted
>>>> object lists and the GEMs list of vm_bos.
>>>>
>>>> As mentioned, if the GEM's dma-resv is different from the VM's
>>>> dma-resv we need
>>>> to take both locks. Ultimately, this would mean we need a
>>>> drm_exec loop, because
>>>> we can't know the order in which to take these locks. Doing a
>>>> full drm_exec loop
>>>> just to put() a vm_bo doesn't sound reasonable to me.
>>>>
>>>> Can we instead just have an internal mutex for locking the lists
>>>> such that we
>>>> avoid taking and dropping the spinlocks, which we use currently,
>>>> in a loop?
>>>
>>> You'd have the same locking inversion problem with a mutex, right?
>>> Since in the eviction path you have resv->mutex, from exec you have
>>> resv->mutex->resv because validate would attempt to grab resv.
>>
>> Both lists, evict and extobj, would need to have a separate mutex,
>> not a common one.
>> We'd also need a dedicated GEM gpuva lock. Then the only rule would
>> be that you can't
>> hold the dma-resv lock when calling put(). Which I admit is not that
>> nice.
>>
>> With the current spinlock solution drivers wouldn't need to worry
>> about anything locking
>> related though. So maybe I come back to your proposal of having a
>> switch for external
>> locking with dma-resv locks entirely. Such that with external dma-
>> resv locking I skip
>> all the spinlocks and add lockdep checks instead.
>>
>> I think that makes the most sense in terms of taking advantage of
>> external dma-resv locking
>> where possible and on the other hand having a self-contained solution
>> if not. This should
>> get all concerns out of the way, yours, Christian's and Boris'.
> 
> If we need additional locks yes, I'd prefer the opt-in/opt-out spinlock
> solution, and check back after a while to see if we can remove either
> option once most pitfalls are hit.

Sounds good, I'll prepare this for a V4.

- Danilo

> 
> Thanks,
> /Thomas
> 
> 
>>
>>>
>>> That said, xe currently indeed does the vm+bo exec dance on vma
>>> put.
>>>
>>> One reason why that seemingly horrible construct is good, is that
>>> when evicting an extobj and you need to access individual vmas to
>>> Zap page table entries or TLB flush, those VMAs are not allowed to
>>> go away (we're not refcounting them). Holding the bo resv on gpuva
>>> put prevents that from happening. Possibly one could use another
>>> mutex to protect the gem->vm_bo list to achieve the same, but we'd
>>> need to hold it on gpuva put.
>>>
>>> /Thomas
>>>
>>>
>>>>
>>>> - Danilo
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> For extobjs an outer lock would be enough in case of
>>>>>>>>> Xe, but I
>>>>>>>>> really would not
>>>>>>>>> like to add even more complexity just to get the
>>>>>>>>> spinlock out of
>>>>>>>>> the way in case
>>>>>>>>> the driver already has an outer lock protecting this
>>>>>>>>> path.
>>>>>>>>
>>>>>>>> I must disagree here. These spinlocks and atomic
>>>>>>>> operations are
>>>>>>>> pretty
>>>>>>>> costly and as discussed earlier this type of locking was
>>>>>>>> the reason
>>>>>>>> (at
>>>>>>>> least according to the commit message) that made
>>>>>>>> Christian drop the
>>>>>>>> XArray
>>>>>>>> use in drm_exec for the same set of objects: "The locking
>>>>>>>> overhead
>>>>>>>> is
>>>>>>>> unecessary and measurable". IMHO the spinlock is the
>>>>>>>> added
>>>>>>>> complexity and a
>>>>>>>> single wide lock following the drm locking guidelines set
>>>>>>>> out by
>>>>>>>> Daniel and
>>>>>>>> David should really be the default choice with an opt-in
>>>>>>>> for a
>>>>>>>> spinlock if
>>>>>>>> needed for async and pushing out to a wq is not an
>>>>>>>> option.
>>>>>>>
>>>>>>> For the external object list an outer lock would work as
>>>>>>> long as it's
>>>>>>> not the
>>>>>>> dma-resv lock of the corresponding GEM object, since here
>>>>>>> we actually
>>>>>>> need to
>>>>>>> remove the list entry from the external object list on
>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>> It's just a bit weird design wise that drivers would need
>>>>>>> to take
>>>>>>> this outer
>>>>>>> lock on:
>>>>>>>
>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also
>>>>>>> drm_gpuvm_bo_put())
>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>> drm_gpuvm_bo_put())
>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>
>>>>>>> Given that it seems reasonable to do all the required
>>>>>>> locking
>>>>>>> internally.
>>>>>>
>>>>>>   From a design POW, there has been a clear direction in XE to
>>>>>> make
>>>>>> things similar to mmap() / munmap(), so this outer lock,
>>>>>> which in Xe is
>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's
>>>>>> protecting
>>>>>> the page-table structures and vma rb tree, the userptr
>>>>>> structures and
>>>>>> the extobj list. Basically it's taken early in the exec
>>>>>> IOCTL, the
>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault
>>>>>> handler, so
>>>>>> all of the above are just asserting that it is taken in the
>>>>>> correct
>>>>>> mode.
>>>>>>
>>>>>> But strictly with this scheme one could also use the vm's
>>>>>> dma_resv for
>>>>>> the extobj list since with drm_exec, it's locked before
>>>>>> traversing the
>>>>>> list.
>>>>>>
>>>>>> The whole point of this scheme is to rely on locks that you
>>>>>> already are
>>>>>> supposed to be holding for various reasons and is simple to
>>>>>> comprehend.
>>>>>
>>>>> I don't agree that we're supposed to hold the VM's resv lock
>>>>> anyways for
>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but
>>>>> I'm fine using it
>>>>> for that purpose nevertheless.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> In order to at least place lockdep checks, the driver would
>>>>>>> need to
>>>>>>> supply the
>>>>>>> corresponding lock's lockdep_map, because the GPUVM
>>>>>>> otherwise doesn't
>>>>>>> know about
>>>>>>> the lock.
>>>>>>
>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>
>>>>> I'd really like to avoid that, especially now that everything
>>>>> got simpler. We
>>>>> should define the actual locks to take instead.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Out of curiosity, what is the overhead of a spin_lock()
>>>>>>> that doesn't
>>>>>>> need to
>>>>>>> spin?
>>>>>>
>>>>>> I guess it's hard to tell exactly, but it is much lower on
>>>>>> modern x86
>>>>>> than what it used to be. Not sure about ARM, which is the
>>>>>> other
>>>>>> architecture important to us. I figure if there is little
>>>>>> cache-line
>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>
>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm
>>>>>>>> *gpuvm,
>>>>>>>> spinlock_t
>>>>>>>> *lock)
>>>>>>>>
>>>>>>>> {
>>>>>>>>
>>>>>>>>       if (!gpuvm->resv_protected_lists)
>>>>>>>>           spin_lock(lock);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>>>> For such drivers, that would require anybody calling
>>>>>>>>>> unlink to
>>>>>>>>>> hold the vm's
>>>>>>>>>> resv, though.
>>>>>>>>> In V4 I want to go back to having a dedicated lock for
>>>>>>>>> the GEMs
>>>>>>>>> gpuva list (or
>>>>>>>>> VM_BO list to be more precise). We can't just use the
>>>>>>>>> dma-resv
>>>>>>>>> lock for that
>>>>>>>>> with VM_BO abstractions, because on destruction of a
>>>>>>>>> VM_BO we
>>>>>>>>> otherwise wouldn't
>>>>>>>>> be allowed to already hold the dma-resv lock. That's
>>>>>>>>> the fix I
>>>>>>>>> was referring to
>>>>>>>>> earlier.
>>>>>>>>
>>>>>>>> Yeah, I can see the need for a dedicated lock for the
>>>>>>>> GEM's gpuva
>>>>>>>> list, but
>>>>>>>> holding the vm's dma-resv lock across the unlink
>>>>>>>> shouldn't be a
>>>>>>>> problem. We
>>>>>>>> may free the object and a pointer to the vm's resv during
>>>>>>>> unlink
>>>>>>>> but we
>>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring
>>>>>>>> that any
>>>>>>>> calls to
>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>
>>>>>>> Drivers calling unlink() from the fence signaling path
>>>>>>> can't use the
>>>>>>> VM's
>>>>>>> dma-resv lock.
>>>>>>
>>>>>> Yes, that made me a bit curious because in the current
>>>>>> version the code
>>>>>> required the object's dma_resv for unlink() which can't be
>>>>>> grabbed
>>>>>> either from the fence signaling path. So are there any
>>>>>> drivers actually
>>>>>> wanting to do that? If so, they will either need to resort to
>>>>>> the
>>>>>> current spinlock solution or they will need to call unlink
>>>>>> from a
>>>>>> workqueue item.
>>>>>
>>>>> As Boris already mentioned we have the dma-resv lock by default
>>>>> or a driver
>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the
>>>>> latter.
>>>>>
>>>>>>>
>>>>>>> Also, what if the object is an external object? We can't
>>>>>>> use the VM's
>>>>>>> dma-resv
>>>>>>> lock here.
>>>>>>
>>>>>> Why? Typically (sync) unlink is only ever called from an
>>>>>> unbind-like
>>>>>> operation where it should be trivial to grab the vm's resv.
>>>>>> Or, for
>>>>>> that matter any outer lock protecting the extobj list. Rule
>>>>>> would be
>>>>>> the drm_gpuvm_bo::entry::extobj  and
>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>> be protected by either the vm's dma_resv (or possibly an
>>>>>> outer lock in
>>>>>> the case of the extobj list).
>>>>>
>>>>> Outer lock wouldn't have been working for updates in the async
>>>>> path, but
>>>>> shouldn't be relevant anymore. We could use the VM's resv for
>>>>> that.
>>>>>
>>>>>>
>>>>>>>    And we can't have the GEM objs dma-resv lock held when
>>>>>>> calling
>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if
>>>>>>> the
>>>>>>> refcount drops
>>>>>>> to zero calls drm_gpuvm_bo_destroy() and
>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>> drop the
>>>>>>> last reference of the GEM object.
>>>>>>
>>>>>> Yes, but this is a different problem as to what exactly
>>>>>> protects
>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal
>>>>>> per bo list
>>>>>> lock, or if we want to keep the bo's dma_resv we need to
>>>>>> ensure that
>>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts
>>>>>> its obj
>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's
>>>>>> refcount (I know
>>>>>> Boris didn't like that, but requiring an explicit refcount
>>>>>> for a
>>>>>> pointer you dereference unless you're under a lock that
>>>>>> ensures keeping
>>>>>> the object alive is pretty much required?) But anyway for the
>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal
>>>>>> spinlock)
>>>>>> I don't have a strong preference.
>>>>>
>>>>> We can keep the GEM objects dma-resv lock, however as mentioned
>>>>> above
>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both
>>>>> the VM's resv lock
>>>>> and the GEM's resv lock in case they differ.
>>>>>
>>>>
>>>>>>>>
>>>>
>>>
>>
>