linux-kernel - Re: [PATCH drm-misc-next v8 09/12] drm/gpuvm: reference count drm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b5b3a51-6b70-b5ba-1017-b79f1519ed09@linux.intel.com>
Date:   Fri, 10 Nov 2023 11:52:14 +0100
From:   Thomas Hellström 
        <thomas.hellstrom@...ux.intel.com>
To:     Christian König <christian.koenig@....com>,
        Danilo Krummrich <dakr@...hat.com>
Cc:     airlied@...il.com, daniel@...ll.ch, matthew.brost@...el.com,
        sarah.walker@...tec.com, donald.robson@...tec.com,
        boris.brezillon@...labora.com, faith@...strand.net,
        dri-devel@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH drm-misc-next v8 09/12] drm/gpuvm: reference count
 drm_gpuvm structures


On 11/10/23 11:42, Christian König wrote:
> Am 10.11.23 um 10:39 schrieb Thomas Hellström:
>>
>> [SNIP]
>
>> I was thinking more of the general design of a base-class that needs 
>> to be refcounted. Say a driver vm that inherits from gpu-vm, 
>> gem_object and yet another base-class that supplies its own refcount. 
>> What's the best-practice way to do refcounting? All base-classes 
>> supplying a refcount of its own, or the subclass supplying a refcount 
>> and the base-classes supply destroy helpers.
>
> From my experience the most common design pattern in the Linux kernel 
> is that you either have reference counted objects which contain a 
> private pointer (like struct file, struct inode etc..) or the lifetime 
> is defined by the user of the object instead of reference counting and 
> in this case you can embed it into your own object.
>
>>
>> But to be clear this is nothing I see needing urgent attention.
>>
>>>
>>>>
>>>>>
>>>>> Well, I have never seen stuff like that in the kernel. Might be 
>>>>> that this works, but I would rather not try if avoidable.
>>>>>
>>>>>>
>>>>>> That would also make it possible for the driver to decide the 
>>>>>> context for the put() call: If the driver needs to be able to 
>>>>>> call put() from irq / atomic context but the base-class'es 
>>>>>> destructor doesn't allow atomic context, the driver can push 
>>>>>> freeing out to a work item if needed.
>>>>>>
>>>>>> Finally, the refcount overflow Christian pointed out. Limiting 
>>>>>> the number of mapping sounds like a reasonable remedy to me.
>>>>>
>>>>> Well that depends, I would rather avoid having a dependency for 
>>>>> mappings.
>>>>>
>>>>> Taking the CPU VM handling as example as far as I know 
>>>>> vm_area_structs doesn't grab a reference to their mm_struct 
>>>>> either. Instead they get automatically destroyed when the 
>>>>> mm_struct is destroyed.
>>>>
>>>> Certainly, that would be possible. However, thinking about it, this 
>>>> might call for
>>>> huge trouble.
>>>>
>>>> First of all, we'd still need to reference count a GPUVM and take a 
>>>> reference for each
>>>> VM_BO, as we do already. Now instead of simply increasing the 
>>>> reference count for each
>>>> mapping as well, we'd need a *mandatory* driver callback that is 
>>>> called when the GPUVM
>>>> reference count drops to zero. Maybe something like vm_destroy().
>>>>
>>>> The reason is that GPUVM can't just remove all mappings from the 
>>>> tree nor can it free them
>>>> by itself, since drivers might use them for tracking their 
>>>> allocated page tables and/or
>>>> other stuff.
>>>>
>>>> Now, let's think about the scope this callback might be called 
>>>> from. When a VM_BO is destroyed
>>>> the driver might hold a couple of locks (for Xe it would be the 
>>>> VM's shared dma-resv lock and
>>>> potentially the corresponding object's dma-resv lock if they're not 
>>>> the same already). If
>>>> destroying this VM_BO leads to the VM being destroyed, the drivers 
>>>> vm_destroy() callback would
>>>> be called with those locks being held as well.
>>>>
>>>> I feel like doing this finally opens the doors of the locking hell 
>>>> entirely. I think we should
>>>> really avoid that.
>>
>> I don't think we need to worry much about this particular locking 
>> hell because if we hold
>
> I have to agree with Danilo here. Especially you have cases where you 
> usually lock BO->VM (for example eviction) as well as cases where you 
> need to lock VM->BO (command submission).
>
> Because of this in amdgpu we used (or abused?) the dma_resv of the 
> root BO as lock for the VM. Since this is a ww_mutex locking it in 
> both VM, BO as well as BO, VM order works.

Yes, gpuvm is doing the same. (although not necessarily using the 
page-table root bo, but any bo of the driver's choice). But I read it as 
Danilo feared the case where the VM destructor was called with a VM resv 
(or possibly bo resv) held. I meant the driver can easily ensure that's 
not happening, and in some cases it can't happen.

Thanks,

Thomas



>
> Regards,
> Christian.
>
>> , for example a vm and bo resv when putting the vm_bo, we need to 
>> keep additional strong references for the bo / vm pointer we use for 
>> unlocking. Hence putting the vm_bo under those locks can never lead 
>> to the vm getting destroyed.
>>
>> Also, don't we already sort of have a mandatory vm_destroy callback?
>>
>> +    if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free))
>> +        return;
>>
>>
>>
>>>
>>> That's a really good point, but I fear exactly that's the use case.
>>>
>>> I would expect that VM_BO structures are added in the 
>>> drm_gem_object_funcs.open callback and freed in 
>>> drm_gem_object_funcs.close.
>>>
>>> Since it is perfectly legal for userspace to close a BO while there 
>>> are still mappings (can trivial be that the app is killed) I would 
>>> expect that the drm_gem_object_funcs.close handling is something 
>>> like asking drm_gpuvm destroying the VM_BO and getting the mappings 
>>> which should be cleared in the page table in return.
>>>
>>> In amdgpu we even go a step further and the VM structure keeps track 
>>> of all the mappings of deleted VM_BOs so that higher level can query 
>>> those and clear them later on.
>>>
>>> Background is that the drm_gem_object_funcs.close can't fail, but it 
>>> can perfectly be that the app is killed because of an OOM situation 
>>> and we can't do page tables updates in that moment because of this.
>>>
>>>>
>>>>>
>>>>> Which makes sense in that case because when the mm_struct is gone 
>>>>> the vm_area_struct doesn't make sense any more either.
>>>>>
>>>>> What we clearly need is a reference to prevent the VM or at least 
>>>>> the shared resv to go away to early.
>>>>
>>>> Yeah, that was a good hint and we've covered that.
>>>>
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> But I think all of this is fixable as follow-ups if needed, 
>>>>>> unless I'm missing something crucial.
>>>>
>>>> Fully agree, I think at this point we should go ahead and land this 
>>>> series.
>>
>> +1.
>>
>> /Thomas
>>
>>
>>>>
>>>
>>> Yeah, agree this is not UAPI so not nailed in stone. Feel free to 
>>> add my acked-by as well if you want.
>>>
>>> Only keep in mind that when you give drivers some functionality in a 
>>> common component they usually expect to keep that functionality.
>>>
>>> For example changing the dma_resv object to make sure that drivers 
>>> can't cause use after free errors any more was an extremely annoying 
>>> experience since every user of those interface had to change at once.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>>>>
>>>>>> Just my 2 cents.
>>>>>>
>>>>>> /Thomas
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>