linux-kernel - Re: [PATCH drm-misc-next v8 09/12] drm/gpuvm: reference count drm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f7a64e17-d8b0-a20a-4e27-46f448a10bd4@linux.intel.com>
Date:   Fri, 10 Nov 2023 10:39:59 +0100
From:   Thomas Hellström 
        <thomas.hellstrom@...ux.intel.com>
To:     Christian König <christian.koenig@....com>,
        Danilo Krummrich <dakr@...hat.com>
Cc:     airlied@...il.com, daniel@...ll.ch, matthew.brost@...el.com,
        sarah.walker@...tec.com, donald.robson@...tec.com,
        boris.brezillon@...labora.com, faith@...strand.net,
        dri-devel@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH drm-misc-next v8 09/12] drm/gpuvm: reference count
 drm_gpuvm structures


On 11/10/23 09:50, Christian König wrote:
> Am 09.11.23 um 19:34 schrieb Danilo Krummrich:
>> On 11/9/23 17:03, Christian König wrote:
>>> Am 09.11.23 um 16:50 schrieb Thomas Hellström:
>>>> [SNIP]
>>>>>>
>>>> Did we get any resolution on this?
>>>>
>>>> FWIW, my take on this is that it would be possible to get GPUVM to 
>>>> work both with and without internal refcounting; If with, the 
>>>> driver needs a vm close to resolve cyclic references, if without 
>>>> that's not necessary. If GPUVM is allowed to refcount in mappings 
>>>> and vm_bos, that comes with a slight performance drop but as Danilo 
>>>> pointed out, the VM lifetime problem iterating over a vm_bo's 
>>>> mapping becomes much easier and the code thus becomes easier to 
>>>> maintain moving forward. That convinced me it's a good thing.
>>>
>>> I strongly believe you guys stumbled over one of the core problems 
>>> with the VM here and I think that reference counting is the right 
>>> answer to solving this.
>>>
>>> The big question is that what is reference counted and in which 
>>> direction does the dependency points, e.g. we have here VM, BO, 
>>> BO_VM and Mapping objects.
>>>
>>> Those patches here suggest a counted Mapping -> VM reference and I'm 
>>> pretty sure that this isn't a good idea. What we should rather 
>>> really have is a BO -> VM or BO_VM ->VM reference. In other words 
>>> that each BO which is part of the VM keeps a reference to the VM.
>>
>> We have both. Please see the subsequent patch introducing VM_BO 
>> structures for that.
>>
>> As I explained, mappings (struct drm_gpuva) keep a pointer to their 
>> VM they're mapped
>> in and besides that it doesn't make sense to free a VM that still 
>> contains mappings,
>> the reference count ensures that. This simply ensures memory safety.
>>
>>>
>>> BTW: At least in amdgpu we can have BOs which (temporary) doesn't 
>>> have any mappings, but are still considered part of the VM.
>>
>> That should be possible.
>>
>>>
>>>>
>>>> Another issue Christian brought up is that something intended to be 
>>>> embeddable (a base class) shouldn't really have its own refcount. I 
>>>> think that's a valid point. If you at some point need to derive 
>>>> from multiple such structs each having its own refcount, things 
>>>> will start to get weird. One way to resolve that would be to have 
>>>> the driver's subclass provide get() and put() ops, and export a 
>>>> destructor for the base-class, rather than to have the base-class 
>>>> provide the refcount and a destructor  ops.
>>
>> GPUVM simply follows the same pattern we have with drm_gem_objects. 
>> And I think it makes
>> sense. Why would we want to embed two struct drm_gpuvm in a single 
>> driver structure?
>
> Because you need one drm_gpuvm structure for each application using 
> the driver? Or am I missing something?
>
> As far as I can see a driver would want to embed that into your fpriv 
> structure which is allocated during drm_driver.open callback.

I was thinking more of the general design of a base-class that needs to 
be refcounted. Say a driver vm that inherits from gpu-vm, gem_object and 
yet another base-class that supplies its own refcount. What's the 
best-practice way to do refcounting? All base-classes supplying a 
refcount of its own, or the subclass supplying a refcount and the 
base-classes supply destroy helpers.

But to be clear this is nothing I see needing urgent attention.

>
>>
>>>
>>> Well, I have never seen stuff like that in the kernel. Might be that 
>>> this works, but I would rather not try if avoidable.
>>>
>>>>
>>>> That would also make it possible for the driver to decide the 
>>>> context for the put() call: If the driver needs to be able to call 
>>>> put() from irq / atomic context but the base-class'es destructor 
>>>> doesn't allow atomic context, the driver can push freeing out to a 
>>>> work item if needed.
>>>>
>>>> Finally, the refcount overflow Christian pointed out. Limiting the 
>>>> number of mapping sounds like a reasonable remedy to me.
>>>
>>> Well that depends, I would rather avoid having a dependency for 
>>> mappings.
>>>
>>> Taking the CPU VM handling as example as far as I know 
>>> vm_area_structs doesn't grab a reference to their mm_struct either. 
>>> Instead they get automatically destroyed when the mm_struct is 
>>> destroyed.
>>
>> Certainly, that would be possible. However, thinking about it, this 
>> might call for
>> huge trouble.
>>
>> First of all, we'd still need to reference count a GPUVM and take a 
>> reference for each
>> VM_BO, as we do already. Now instead of simply increasing the 
>> reference count for each
>> mapping as well, we'd need a *mandatory* driver callback that is 
>> called when the GPUVM
>> reference count drops to zero. Maybe something like vm_destroy().
>>
>> The reason is that GPUVM can't just remove all mappings from the tree 
>> nor can it free them
>> by itself, since drivers might use them for tracking their allocated 
>> page tables and/or
>> other stuff.
>>
>> Now, let's think about the scope this callback might be called from. 
>> When a VM_BO is destroyed
>> the driver might hold a couple of locks (for Xe it would be the VM's 
>> shared dma-resv lock and
>> potentially the corresponding object's dma-resv lock if they're not 
>> the same already). If
>> destroying this VM_BO leads to the VM being destroyed, the drivers 
>> vm_destroy() callback would
>> be called with those locks being held as well.
>>
>> I feel like doing this finally opens the doors of the locking hell 
>> entirely. I think we should
>> really avoid that.

I don't think we need to worry much about this particular locking hell 
because if we hold, for example a vm and bo resv when putting the vm_bo, 
we need to keep additional strong references for the bo / vm pointer we 
use for unlocking. Hence putting the vm_bo under those locks can never 
lead to the vm getting destroyed.

Also, don't we already sort of have a mandatory vm_destroy callback?

+	if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free))
+		return;



>
> That's a really good point, but I fear exactly that's the use case.
>
> I would expect that VM_BO structures are added in the 
> drm_gem_object_funcs.open callback and freed in 
> drm_gem_object_funcs.close.
>
> Since it is perfectly legal for userspace to close a BO while there 
> are still mappings (can trivial be that the app is killed) I would 
> expect that the drm_gem_object_funcs.close handling is something like 
> asking drm_gpuvm destroying the VM_BO and getting the mappings which 
> should be cleared in the page table in return.
>
> In amdgpu we even go a step further and the VM structure keeps track 
> of all the mappings of deleted VM_BOs so that higher level can query 
> those and clear them later on.
>
> Background is that the drm_gem_object_funcs.close can't fail, but it 
> can perfectly be that the app is killed because of an OOM situation 
> and we can't do page tables updates in that moment because of this.
>
>>
>>>
>>> Which makes sense in that case because when the mm_struct is gone 
>>> the vm_area_struct doesn't make sense any more either.
>>>
>>> What we clearly need is a reference to prevent the VM or at least 
>>> the shared resv to go away to early.
>>
>> Yeah, that was a good hint and we've covered that.
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> But I think all of this is fixable as follow-ups if needed, unless 
>>>> I'm missing something crucial.
>>
>> Fully agree, I think at this point we should go ahead and land this 
>> series.

+1.

/Thomas


>>
>
> Yeah, agree this is not UAPI so not nailed in stone. Feel free to add 
> my acked-by as well if you want.
>
> Only keep in mind that when you give drivers some functionality in a 
> common component they usually expect to keep that functionality.
>
> For example changing the dma_resv object to make sure that drivers 
> can't cause use after free errors any more was an extremely annoying 
> experience since every user of those interface had to change at once.
>
> Regards,
> Christian.
>
>>
>>>>
>>>> Just my 2 cents.
>>>>
>>>> /Thomas
>>>>
>>>>
>>>
>>
>