[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f7a64e17-d8b0-a20a-4e27-46f448a10bd4@linux.intel.com>
Date: Fri, 10 Nov 2023 10:39:59 +0100
From: Thomas Hellström
<thomas.hellstrom@...ux.intel.com>
To: Christian König <christian.koenig@....com>,
Danilo Krummrich <dakr@...hat.com>
Cc: airlied@...il.com, daniel@...ll.ch, matthew.brost@...el.com,
sarah.walker@...tec.com, donald.robson@...tec.com,
boris.brezillon@...labora.com, faith@...strand.net,
dri-devel@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH drm-misc-next v8 09/12] drm/gpuvm: reference count
drm_gpuvm structures
On 11/10/23 09:50, Christian König wrote:
> Am 09.11.23 um 19:34 schrieb Danilo Krummrich:
>> On 11/9/23 17:03, Christian König wrote:
>>> Am 09.11.23 um 16:50 schrieb Thomas Hellström:
>>>> [SNIP]
>>>>>>
>>>> Did we get any resolution on this?
>>>>
>>>> FWIW, my take on this is that it would be possible to get GPUVM to
>>>> work both with and without internal refcounting; If with, the
>>>> driver needs a vm close to resolve cyclic references, if without
>>>> that's not necessary. If GPUVM is allowed to refcount in mappings
>>>> and vm_bos, that comes with a slight performance drop but as Danilo
>>>> pointed out, the VM lifetime problem iterating over a vm_bo's
>>>> mapping becomes much easier and the code thus becomes easier to
>>>> maintain moving forward. That convinced me it's a good thing.
>>>
>>> I strongly believe you guys stumbled over one of the core problems
>>> with the VM here and I think that reference counting is the right
>>> answer to solving this.
>>>
>>> The big question is that what is reference counted and in which
>>> direction does the dependency points, e.g. we have here VM, BO,
>>> BO_VM and Mapping objects.
>>>
>>> Those patches here suggest a counted Mapping -> VM reference and I'm
>>> pretty sure that this isn't a good idea. What we should rather
>>> really have is a BO -> VM or BO_VM ->VM reference. In other words
>>> that each BO which is part of the VM keeps a reference to the VM.
>>
>> We have both. Please see the subsequent patch introducing VM_BO
>> structures for that.
>>
>> As I explained, mappings (struct drm_gpuva) keep a pointer to their
>> VM they're mapped
>> in and besides that it doesn't make sense to free a VM that still
>> contains mappings,
>> the reference count ensures that. This simply ensures memory safety.
>>
>>>
>>> BTW: At least in amdgpu we can have BOs which (temporary) doesn't
>>> have any mappings, but are still considered part of the VM.
>>
>> That should be possible.
>>
>>>
>>>>
>>>> Another issue Christian brought up is that something intended to be
>>>> embeddable (a base class) shouldn't really have its own refcount. I
>>>> think that's a valid point. If you at some point need to derive
>>>> from multiple such structs each having its own refcount, things
>>>> will start to get weird. One way to resolve that would be to have
>>>> the driver's subclass provide get() and put() ops, and export a
>>>> destructor for the base-class, rather than to have the base-class
>>>> provide the refcount and a destructor ops.
>>
>> GPUVM simply follows the same pattern we have with drm_gem_objects.
>> And I think it makes
>> sense. Why would we want to embed two struct drm_gpuvm in a single
>> driver structure?
>
> Because you need one drm_gpuvm structure for each application using
> the driver? Or am I missing something?
>
> As far as I can see a driver would want to embed that into your fpriv
> structure which is allocated during drm_driver.open callback.
I was thinking more of the general design of a base-class that needs to
be refcounted. Say a driver vm that inherits from gpu-vm, gem_object and
yet another base-class that supplies its own refcount. What's the
best-practice way to do refcounting? All base-classes supplying a
refcount of its own, or the subclass supplying a refcount and the
base-classes supply destroy helpers.
But to be clear this is nothing I see needing urgent attention.
>
>>
>>>
>>> Well, I have never seen stuff like that in the kernel. Might be that
>>> this works, but I would rather not try if avoidable.
>>>
>>>>
>>>> That would also make it possible for the driver to decide the
>>>> context for the put() call: If the driver needs to be able to call
>>>> put() from irq / atomic context but the base-class'es destructor
>>>> doesn't allow atomic context, the driver can push freeing out to a
>>>> work item if needed.
>>>>
>>>> Finally, the refcount overflow Christian pointed out. Limiting the
>>>> number of mapping sounds like a reasonable remedy to me.
>>>
>>> Well that depends, I would rather avoid having a dependency for
>>> mappings.
>>>
>>> Taking the CPU VM handling as example as far as I know
>>> vm_area_structs doesn't grab a reference to their mm_struct either.
>>> Instead they get automatically destroyed when the mm_struct is
>>> destroyed.
>>
>> Certainly, that would be possible. However, thinking about it, this
>> might call for
>> huge trouble.
>>
>> First of all, we'd still need to reference count a GPUVM and take a
>> reference for each
>> VM_BO, as we do already. Now instead of simply increasing the
>> reference count for each
>> mapping as well, we'd need a *mandatory* driver callback that is
>> called when the GPUVM
>> reference count drops to zero. Maybe something like vm_destroy().
>>
>> The reason is that GPUVM can't just remove all mappings from the tree
>> nor can it free them
>> by itself, since drivers might use them for tracking their allocated
>> page tables and/or
>> other stuff.
>>
>> Now, let's think about the scope this callback might be called from.
>> When a VM_BO is destroyed
>> the driver might hold a couple of locks (for Xe it would be the VM's
>> shared dma-resv lock and
>> potentially the corresponding object's dma-resv lock if they're not
>> the same already). If
>> destroying this VM_BO leads to the VM being destroyed, the drivers
>> vm_destroy() callback would
>> be called with those locks being held as well.
>>
>> I feel like doing this finally opens the doors of the locking hell
>> entirely. I think we should
>> really avoid that.
I don't think we need to worry much about this particular locking hell
because if we hold, for example a vm and bo resv when putting the vm_bo,
we need to keep additional strong references for the bo / vm pointer we
use for unlocking. Hence putting the vm_bo under those locks can never
lead to the vm getting destroyed.
Also, don't we already sort of have a mandatory vm_destroy callback?
+ if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free))
+ return;
>
> That's a really good point, but I fear exactly that's the use case.
>
> I would expect that VM_BO structures are added in the
> drm_gem_object_funcs.open callback and freed in
> drm_gem_object_funcs.close.
>
> Since it is perfectly legal for userspace to close a BO while there
> are still mappings (can trivial be that the app is killed) I would
> expect that the drm_gem_object_funcs.close handling is something like
> asking drm_gpuvm destroying the VM_BO and getting the mappings which
> should be cleared in the page table in return.
>
> In amdgpu we even go a step further and the VM structure keeps track
> of all the mappings of deleted VM_BOs so that higher level can query
> those and clear them later on.
>
> Background is that the drm_gem_object_funcs.close can't fail, but it
> can perfectly be that the app is killed because of an OOM situation
> and we can't do page tables updates in that moment because of this.
>
>>
>>>
>>> Which makes sense in that case because when the mm_struct is gone
>>> the vm_area_struct doesn't make sense any more either.
>>>
>>> What we clearly need is a reference to prevent the VM or at least
>>> the shared resv to go away to early.
>>
>> Yeah, that was a good hint and we've covered that.
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> But I think all of this is fixable as follow-ups if needed, unless
>>>> I'm missing something crucial.
>>
>> Fully agree, I think at this point we should go ahead and land this
>> series.
+1.
/Thomas
>>
>
> Yeah, agree this is not UAPI so not nailed in stone. Feel free to add
> my acked-by as well if you want.
>
> Only keep in mind that when you give drivers some functionality in a
> common component they usually expect to keep that functionality.
>
> For example changing the dma_resv object to make sure that drivers
> can't cause use after free errors any more was an extremely annoying
> experience since every user of those interface had to change at once.
>
> Regards,
> Christian.
>
>>
>>>>
>>>> Just my 2 cents.
>>>>
>>>> /Thomas
>>>>
>>>>
>>>
>>
>
Powered by blists - more mailing lists