linux-kernel - Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67958920-c5bb-a0f5-2306-e3ae4fdbaeb3@redhat.com>
Date:   Mon, 6 Feb 2023 14:27:31 +0100
From:   Danilo Krummrich <dakr@...hat.com>
To:     Christian König <christian.koenig@....com>,
        Dave Airlie <airlied@...il.com>
Cc:     Matthew Brost <matthew.brost@...el.com>, daniel@...ll.ch,
        corbet@....net, dri-devel@...ts.freedesktop.org,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        mripard@...nel.org, bskeggs@...hat.com, jason@...kstrand.net,
        nouveau@...ts.freedesktop.org, airlied@...hat.com
Subject: Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi
 interfaces

On 2/6/23 10:48, Christian König wrote:
> Am 02.02.23 um 19:31 schrieb Danilo Krummrich:
>> On 2/2/23 12:53, Christian König wrote:
>>> Am 01.02.23 um 09:10 schrieb Dave Airlie:
>>>> [SNIP]
>>>>>> For drivers that don't intend to merge at all and (somehow) are
>>>>>> capable of dealing with sparse regions without knowing the sparse
>>>>>> region's boundaries, it'd be easy to make those gpuva_regions 
>>>>>> optional.
>>>>> Yeah, but this then defeats the approach of having the same hw
>>>>> independent interface/implementation for all drivers.
>>>> I think you are running a few steps ahead here. The plan isn't to have
>>>> an independent interface, it's to provide a set of routines and
>>>> tracking that will be consistent across drivers, so that all drivers
>>>> once using them will operate in mostly the same fashion with respect
>>>> to GPU VA tracking and VA/BO lifetimes. Already in the tree we have
>>>> amdgpu and freedreno which I think end up operating slightly different
>>>> around lifetimes. I'd like to save future driver writers the effort of
>>>> dealing with those decisions and this should drive their user api
>>>> design so to enable vulkan sparse bindings.
>>>
>>> Ok in this case I'm pretty sure this is *NOT* a good idea.
>>>
>>> See this means that we define the UAPI implicitly by saying to 
>>> drivers to use a common framework for their VM implementation which 
>>> then results in behavior A,B,C,D....
>>>
>>> If a driver strides away from this common framework because it has 
>>> different requirements based on how his hw work you certainly get 
>>> different behavior again (and you have tons of hw specific 
>>> requirements in here).
>>>
>>> What we should do instead if we want to have some common handling 
>>> among drivers (which I totally agree on makes sense) then we should 
>>> define the UAPI explicitly.
>>
>> By asking that I don't want to say I'm against this idea, I'm just 
>> wondering how it becomes easier to deal with "tons of hw specific 
>> requirements" by generalizing things even more?
> 
> I'm already maintaining two different GPU VM solutions in the GPU 
> drivers in the kernel, radeon and amdgpu. The hw they driver is 
> identical, just the UAPI is different. And only because of the different 
> UAPI they can't have the same VM backend implementation.
> 
> The hw stuff is completely abstract able. That's just stuff you need to 
> consider when defining the structures you pass around.

Wouldn't we need to have strict limitations on that, such that HW 
specific structures / fields are not allowed to break the semantics of 
the UAPI? Because otherwise we wouldn't be able to attach generalized 
components to the unified UAPI which ultimately would be the whole 
purpose. So, if this consideration is correct, I'd still see a risk of 
drivers striding away from it because of their requirements. Again, I 
think a unified UAPI is a good idea, but it sounds more difficult to me 
than this last paragraph implies.

> 
> But a messed up UAPI is sometimes impossible to fix because of backward 
> compatibility.
> 
> We learned that the hard way with radeon and mostly fixed it by coming 
> up with a completely new implementation for amdgpu.
> 
>> What makes us think that we do a better job in considering all hw 
>> specific requirements with a unified UAPI than with a more lightweight 
>> generic component for tracking VA mappings?
> 
> Because this defines the UAPI implicitly and that's seldom a good idea.
> 
> As I said before tracking is the easy part of the job. Defining this 
> generic component helps a little bit writing new drivers, but it leaves 
> way to much room for speculations on the UAPI.
> 

Trying to move forward, I agree that a unified UAPI would improve the 
situation regarding the problems you mentioned and the examples you have 
given.

However, not having the GPUVA manager wouldn't give us a unified UAPI 
either. And as long as it delivers a generic component to solve a 
problem while not making the overall situation worse or preventing us 
from reaching this desirable goal of having a unified UAPI I tend to 
think it's fine to have such a component.

>> Also, wouldn't we need something like the GPUVA manager as part of a 
>> unified UAPI?
> 
> Not necessarily. We can write components to help drivers implement the 
> UAPI, but this isn't mandatory.

Well, yes, not necessarily. However, as mentioned above, wouldn't it be 
a major goal of a unified UAPI to be able to attach generic components 
to it?

> 
>>
>>>
>>> For example we could have a DRM_IOCTL_GPU_VM which takes both driver 
>>> independent as well as driver dependent information and then has the 
>>> documented behavior:
>>> a) VAs do (or don't) vanish automatically when the GEM handle is closed.
>>> b) GEM BOs do (or don't) get an additional reference for each VM they 
>>> are used in.
>>> c) Can handle some common use cases driver independent (BO mappings, 
>>> readonly, writeonly, sparse etc...).
>>> d) Has a well defined behavior when the operation is executed async. 
>>> E.g. in/out fences.
>>> e) Can still handle hw specific stuff like (for example) trap on 
>>> access etc....
>>> ...
>>>
>>> Especially d is what Bas and I have pretty much already created a 
>>> prototype for the amdgpu specific IOCTL for, but essentially this is 
>>> completely driver independent and actually the more complex stuff. 
>>> Compared to that common lifetime of BOs is just nice to have.
>>>
>>> I strongly think we should concentrate on getting this right as well.
>>>
>>>> Now if merging is a feature that makes sense to one driver maybe it
>>>> makes sense to all, however there may be reasons amdgpu gets away
>>>> without merging that other drivers might not benefit from, there might
>>>> also be a benefit to amdgpu from merging that you haven't looked at
>>>> yet, so I think we could leave merging as an optional extra driver
>>>> knob here. The userspace API should operate the same, it would just be
>>>> the gpu pagetables that would end up different sizes.
>>>
>>> Yeah, agree completely. The point is that we should not have 
>>> complexity inside the kernel which is not necessarily needed in the 
>>> kernel.
>>>
>>> So merging or not is something we have gone back and forth for 
>>> amdgpu, one the one hand it reduces the memory footprint of the 
>>> housekeeping overhead on the other hand it makes the handling more 
>>> complex, error prone and use a few more CPU cycles.
>>>
>>> For amdgpu merging is mostly beneficial when you can get rid of a 
>>> whole page tables layer in the hierarchy, but for this you need to 
>>> merge at least 2MiB or 1GiB together. And since that case doesn't 
>>> happen that often we stopped doing it.
>>>
>>> But for my understanding why you need the ranges for the merging? 
>>> Isn't it sufficient to check that the mappings have the same type, 
>>> flags, BO, whatever backing them?
>>
>> Not entirely. Let's assume userspace creates two virtually contiguous 
>> buffers (VKBuffer) A and B. Userspace could bind a BO with BO offset 0 
>> to A (binding 1) and afterwards bind the same BO with BO offset 
>> length(A) to B (binding 2), maybe unlikely but AFAIK not illegal.
>>
>> If we don't know about the bounds of A and B in the kernel, we detect 
>> that both bindings are virtually and physically contiguous and we 
>> merge them.
> 
> Well as far as I can see this is actually legal and desirable.

Legal, not sure, may depend on the semantics of the UAPI. (More on that 
below your next paragraph.)

Desirable, I don't think so. Since those mappings are associated with 
different VKBuffers they get split up later on anyway, hence why bother 
merging?

> 
>>
>> In the best case this was simply useless, because we'll need to split 
>> them anyway later on when A or B is destroyed, but in the worst case 
>> we could fault the GPU, e.g. if merging leads to a change of the page 
>> tables that are backing binding 1, but buffer A is already in use by 
>> userspace.
> 
> WOW wait a second, regions absolutely don't help you with that anyway.
> 
> You need to keep track which mappings are used or otherwise any 
> modification could lead to problems.
> 
> In other words when the GPU already uses A you *must* have a fence on 
> the page tables backing A to prevent their destruction.
> 

As mentioned above, I'm not entirely sure about that and it might just 
depend on the semantics of the UAPI.

My understanding is that userspace is fully responsible on the parts of 
the GPU VA space it owns. This means that userspace needs to take care 
to *not* ask the kernel to modify mappings that are in use currently. 
Hence, the kernel is in charge to not modify mappings it set up on 
behalf of userspace unless userspace explicitly asks the kernel to do so.

If those are valid preconditions, and based on them we want to support 
merging, the kernel must know about the VA space allocations (or 
VKBuffers in userspace terminology) to make sure it never merges across 
their boundaries, which might not make much sense anyway.

>>
>> In Nouveau, I think we could also get rid of regions and do something 
>> driver specific for the handling of the dual page tables, which I want 
>> to use for sparse regions *and* just don't merge (at least for now). 
>> But exactly for the sake of not limiting drivers in their HW specifics 
>> I thought it'd be great if merging is supported in case it makes sense 
>> for a specific HW, especially given the fact that memory sizes are 
>> increasing.
> 
> What do you mean with that?
> 
> If you want your page tables to be modifiable while the GPU is using 
> them (which is basically a standard requirement from sparse bindings in 
> Vulkan) you need double housekeeping anyway.
> 
> Those regions strongly sound like you are pushing stuff which should be 
> handled in userspace inside the kernel.

1. userspace allocates a new VKBuffer with the sparse bit set (0x0 - 
0x800000)

2. kernel creates a new region structure with the range 0x800000 and 
creates a new PT (A) with 4 PTEs with the sparse flag set (page shift is 21)

3. userspace requests a memory backed mapping at 0x200000 with size 0x2000

4. kernel creates a new mapping structure with base address 0x200000 and 
range 0x2000 and creates a new PT (B) with 2 PTEs (page shift is 12) 
"overlaying" PT A

5. userspace crashes unexpectedly for some reason

6. kernel needs to clean things up, iterates the list of mappings and 
unmaps them (PT B is freed); kernel iterates all regions and removes 
them (PT A is freed)

> 
> Regards,
> Christian.
> 
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>>
>>>> Dave.
>>>
>>
>