[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b5b3a51-6b70-b5ba-1017-b79f1519ed09@linux.intel.com>
Date: Fri, 10 Nov 2023 11:52:14 +0100
From: Thomas Hellström
<thomas.hellstrom@...ux.intel.com>
To: Christian König <christian.koenig@....com>,
Danilo Krummrich <dakr@...hat.com>
Cc: airlied@...il.com, daniel@...ll.ch, matthew.brost@...el.com,
sarah.walker@...tec.com, donald.robson@...tec.com,
boris.brezillon@...labora.com, faith@...strand.net,
dri-devel@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH drm-misc-next v8 09/12] drm/gpuvm: reference count
drm_gpuvm structures
On 11/10/23 11:42, Christian König wrote:
> Am 10.11.23 um 10:39 schrieb Thomas Hellström:
>>
>> [SNIP]
>
>> I was thinking more of the general design of a base-class that needs
>> to be refcounted. Say a driver vm that inherits from gpu-vm,
>> gem_object and yet another base-class that supplies its own refcount.
>> What's the best-practice way to do refcounting? All base-classes
>> supplying a refcount of its own, or the subclass supplying a refcount
>> and the base-classes supply destroy helpers.
>
> From my experience the most common design pattern in the Linux kernel
> is that you either have reference counted objects which contain a
> private pointer (like struct file, struct inode etc..) or the lifetime
> is defined by the user of the object instead of reference counting and
> in this case you can embed it into your own object.
>
>>
>> But to be clear this is nothing I see needing urgent attention.
>>
>>>
>>>>
>>>>>
>>>>> Well, I have never seen stuff like that in the kernel. Might be
>>>>> that this works, but I would rather not try if avoidable.
>>>>>
>>>>>>
>>>>>> That would also make it possible for the driver to decide the
>>>>>> context for the put() call: If the driver needs to be able to
>>>>>> call put() from irq / atomic context but the base-class'es
>>>>>> destructor doesn't allow atomic context, the driver can push
>>>>>> freeing out to a work item if needed.
>>>>>>
>>>>>> Finally, the refcount overflow Christian pointed out. Limiting
>>>>>> the number of mapping sounds like a reasonable remedy to me.
>>>>>
>>>>> Well that depends, I would rather avoid having a dependency for
>>>>> mappings.
>>>>>
>>>>> Taking the CPU VM handling as example as far as I know
>>>>> vm_area_structs doesn't grab a reference to their mm_struct
>>>>> either. Instead they get automatically destroyed when the
>>>>> mm_struct is destroyed.
>>>>
>>>> Certainly, that would be possible. However, thinking about it, this
>>>> might call for
>>>> huge trouble.
>>>>
>>>> First of all, we'd still need to reference count a GPUVM and take a
>>>> reference for each
>>>> VM_BO, as we do already. Now instead of simply increasing the
>>>> reference count for each
>>>> mapping as well, we'd need a *mandatory* driver callback that is
>>>> called when the GPUVM
>>>> reference count drops to zero. Maybe something like vm_destroy().
>>>>
>>>> The reason is that GPUVM can't just remove all mappings from the
>>>> tree nor can it free them
>>>> by itself, since drivers might use them for tracking their
>>>> allocated page tables and/or
>>>> other stuff.
>>>>
>>>> Now, let's think about the scope this callback might be called
>>>> from. When a VM_BO is destroyed
>>>> the driver might hold a couple of locks (for Xe it would be the
>>>> VM's shared dma-resv lock and
>>>> potentially the corresponding object's dma-resv lock if they're not
>>>> the same already). If
>>>> destroying this VM_BO leads to the VM being destroyed, the drivers
>>>> vm_destroy() callback would
>>>> be called with those locks being held as well.
>>>>
>>>> I feel like doing this finally opens the doors of the locking hell
>>>> entirely. I think we should
>>>> really avoid that.
>>
>> I don't think we need to worry much about this particular locking
>> hell because if we hold
>
> I have to agree with Danilo here. Especially you have cases where you
> usually lock BO->VM (for example eviction) as well as cases where you
> need to lock VM->BO (command submission).
>
> Because of this in amdgpu we used (or abused?) the dma_resv of the
> root BO as lock for the VM. Since this is a ww_mutex locking it in
> both VM, BO as well as BO, VM order works.
Yes, gpuvm is doing the same. (although not necessarily using the
page-table root bo, but any bo of the driver's choice). But I read it as
Danilo feared the case where the VM destructor was called with a VM resv
(or possibly bo resv) held. I meant the driver can easily ensure that's
not happening, and in some cases it can't happen.
Thanks,
Thomas
>
> Regards,
> Christian.
>
>> , for example a vm and bo resv when putting the vm_bo, we need to
>> keep additional strong references for the bo / vm pointer we use for
>> unlocking. Hence putting the vm_bo under those locks can never lead
>> to the vm getting destroyed.
>>
>> Also, don't we already sort of have a mandatory vm_destroy callback?
>>
>> + if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free))
>> + return;
>>
>>
>>
>>>
>>> That's a really good point, but I fear exactly that's the use case.
>>>
>>> I would expect that VM_BO structures are added in the
>>> drm_gem_object_funcs.open callback and freed in
>>> drm_gem_object_funcs.close.
>>>
>>> Since it is perfectly legal for userspace to close a BO while there
>>> are still mappings (can trivial be that the app is killed) I would
>>> expect that the drm_gem_object_funcs.close handling is something
>>> like asking drm_gpuvm destroying the VM_BO and getting the mappings
>>> which should be cleared in the page table in return.
>>>
>>> In amdgpu we even go a step further and the VM structure keeps track
>>> of all the mappings of deleted VM_BOs so that higher level can query
>>> those and clear them later on.
>>>
>>> Background is that the drm_gem_object_funcs.close can't fail, but it
>>> can perfectly be that the app is killed because of an OOM situation
>>> and we can't do page tables updates in that moment because of this.
>>>
>>>>
>>>>>
>>>>> Which makes sense in that case because when the mm_struct is gone
>>>>> the vm_area_struct doesn't make sense any more either.
>>>>>
>>>>> What we clearly need is a reference to prevent the VM or at least
>>>>> the shared resv to go away to early.
>>>>
>>>> Yeah, that was a good hint and we've covered that.
>>>>
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> But I think all of this is fixable as follow-ups if needed,
>>>>>> unless I'm missing something crucial.
>>>>
>>>> Fully agree, I think at this point we should go ahead and land this
>>>> series.
>>
>> +1.
>>
>> /Thomas
>>
>>
>>>>
>>>
>>> Yeah, agree this is not UAPI so not nailed in stone. Feel free to
>>> add my acked-by as well if you want.
>>>
>>> Only keep in mind that when you give drivers some functionality in a
>>> common component they usually expect to keep that functionality.
>>>
>>> For example changing the dma_resv object to make sure that drivers
>>> can't cause use after free errors any more was an extremely annoying
>>> experience since every user of those interface had to change at once.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>>>>
>>>>>> Just my 2 cents.
>>>>>>
>>>>>> /Thomas
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
Powered by blists - more mailing lists