linux-kernel - Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230302023829.kcxyukt7guacr7xg@revolver>
Date:   Wed, 1 Mar 2023 21:38:29 -0500
From:   "Liam R. Howlett" <Liam.Howlett@...cle.com>
To:     Danilo Krummrich <dakr@...hat.com>
Cc:     airlied@...il.com, daniel@...ll.ch, tzimmermann@...e.de,
        mripard@...nel.org, corbet@....net, christian.koenig@....com,
        bskeggs@...hat.com, matthew.brost@...el.com,
        boris.brezillon@...labora.com, alexdeucher@...il.com,
        ogabbay@...nel.org, bagasdotme@...il.com, willy@...radead.org,
        jason@...kstrand.net, dri-devel@...ts.freedesktop.org,
        nouveau@...ts.freedesktop.org, linux-doc@...r.kernel.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Dave Airlie <airlied@...hat.com>
Subject: Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA
 mappings

* Danilo Krummrich <dakr@...hat.com> [230227 08:17]:

...
> > > Would this variant be significantly more efficient?
> > 
> > Well, what you are doing is walking the tree to see if there's anything
> > there... then re-walking the tree to store it.  So, yes, it's much more
> > efficient..  However, writing is heavier.  How much of the time is spent
> > walking vs writing depends on the size of the tree, but it's rather easy
> > to do this in a single walk of the tree so why wouldn't you?
> 
> I will, I was just curious about how much of an impact it has.
> 
> > 
> > > 
> > > Also, would this also work while already walking the tree?
> > 
> > Yes, to an extent.  If you are at the correct location in the tree, you
> > can write to that location.  If you are not in the correct location and
> > try to write to the tree then things will go poorly..  In this scenario,
> > we are very much walking the tree and writing to it in two steps.
> > 
> > > 
> > > To remove an entry while walking the tree I have a separate function
> > > drm_gpuva_iter_remove(). Would I need something similar for inserting
> > > entries?
> > 
> > I saw that.  Your remove function uses the erase operation which is
> > implemented as a walk to that location and a store of a null over the
> > range that is returned.  You do not need a function to insert an entry
> > if the maple state is at the correct location, and that doesn't just
> > mean setting mas.index/mas.last to the correct value.  There is a node &
> > offset saved in the maple state that needs to be in the correct
> > location.  If you store to that node then the node may be replaced, so
> > other iterators that you have may become stale, but the one you used
> > execute the store operation will now point to the new node with the new
> > entry.
> > 
> > > 
> > > I already provided this example in a separate mail thread, but it may makes
> > > sense to move this to the mailing list:
> > > 
> > > In __drm_gpuva_sm_map() we're iterating a given range of the tree, where the
> > > given range is the size of the newly requested mapping. __drm_gpuva_sm_map()
> > > invokes a callback for each sub-operation that needs to be taken in order to
> > > fulfill this mapping request. In most cases such a callback just creates a
> > > drm_gpuva_op object and stores it in a list.
> > > 
> > > However, drivers can also implement the callback, such that they directly
> > > execute this operation within the callback.
> > > 
> > > Let's have a look at the following example:
> > > 
> > >       0     a     2
> > > old: |-----------|       (bo_offset=n)
> > > 
> > >             1     b     3
> > > req:       |-----------| (bo_offset=m)
> > > 
> > >       0  a' 1     b     3
> > > new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
> > > 
> > > This would result in the following operations.
> > > 
> > > __drm_gpuva_sm_map() finds entry "a" and calls back into the driver
> > > suggesting to re-map "a" with the new size. The driver removes entry "a"
> > > from the tree and adds "a'"
> > 
> > What you have here won't work.  The driver will cause your iterators
> > maple state to point to memory that is freed.  You will either need to
> > pass through your iterator so that the modifications can occur with that
> > maple state so it remains valid, or you will need to invalidate the
> > iterator on every modification by the driver.
> > 
> > I'm sure the first idea you have will be to invalidate the iterator, but
> > that is probably not the way to proceed.  Even ignoring the unclear
> > locking of two maple states trying to modify the tree, this is rather
> > inefficient - each invalidation means a re-walk of the tree.  You may as
> > well not use an iterator in this case.
> > 
> > Depending on how/when the lookups occur, you could still iterate over
> > the tree and let the driver modify the ending of "a", but leave the tree
> > alone and just store b over whatever - but the failure scenarios may
> > cause you grief.
> > 
> > If you pass the iterator through, then you can just use it to do your
> > writes and keep iterating as if nothing changed.
> 
> Passing through the iterater clearly seems to be the way to go.
> 
> I assume that if the entry to insert isn't at the location of the iterator
> (as in the following example) we can just keep walking to this location my
> changing the index of the mas and calling mas_walk()?

no.  You have to mas_set() to the value and walk from the top of the
tree.  mas_walk() walks down, not from side to side - well, it does go
forward within a node (increasing offset), but if you hit the node limit
then you have gotten yourself in trouble.

> This would also imply
> that the "outer" tree walk continues after the entry we just inserted,
> right?

I don't understand the "outer" tree walk statement.

> 
>            1     a     3
> old:       |-----------| (bo_offset=n)
> 
>      0     b     2
> req: |-----------|       (bo_offset=m)
> 
>      0     b     2  a' 3
> new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> 
> Again, after finding "a", we want to remove it and insert "a'" instead.

Ah, so you could walk to 0, see that it's NULL from 0 - 1, call
mas_next() and get "a" from 1 - 3, write "a'" from 2 - 3:

        0     1  a   2  a' 3
broken: |-----|------|-----| (a is broken in this 1/2 step)

mas_set_range(&mas, 0, 2); /* Resets the tree location to MAS_START */
mas_store(&mas, b);
        0     b     2  a' 3
new:    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)


You can *probably* also get away with this:

walk to 0, see that it's NULL from 0 - 1, call mas_next() and get "a"
from 1 - 3, write "a'" from 2 - 3:

        0     1  a   2  a' 3
broken: |-----|------|-----| (a is broken in this 1/2 step)

mas_prev(&mas, 0); /* Looking at broken a from 1-2.
mas_store(&mas, NULL); /* NULL is expanded on write to 0-2.
            0    NULL   2  a' 3
broken':    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)

mas_store(&mas, b);
        0     b     2  a' 3
new:    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)

You may want to iterate backwards and do the writes as you go until you
have enough room.. it really depends how you want to go about doing
things.

> 
> > 
> > > 
> > > __drm_gpuva_sm_map(), ideally, continues the loop searching for nodes
> > > starting from the end of "a" (which is 2) till the end of the requested
> > > mapping "b" (which is 3). Since it doesn't find any other mapping within
> > > this range it calls back into the driver suggesting to finally map "b".
> > > 
> > > If there would have been another mapping between 2 and 3 it would have
> > > called back into the driver asking to unmap this mapping beforehand.
> > > 
> > > So, it boils down to re-mapping as described at the beginning (and
> > > analogously at the end) of a new mapping range and removing of entries that
> > > are enclosed by the new mapping range.
> > 
> > I assume the unmapped area is no longer needed, and the 're-map' is
> > really a removal of information?  Otherwise I'd suggest searching for a
> > gap which fits your request.  What you have here is a lot like
> > "MAP_FIXED" vs top-down/bottom-up search in the VMA code, this seems to
> > be like your __drm_gpuva_sm_map() and the drm mm range allocator with
> > DRM_MM_INSERT_LOW, and DRM_MM_INSERT_HIGH.
> > 
> > Why can these split/unmappings fail?  Is it because they are still
> > needed?
> > 
> 
> You mean the check before the mas_*() operations in drm_gpuva_insert()?

Yes, the callbacks.

> 
> Removing entries should never fail, inserting entries should fail when the
> caller tries to store to an area outside of the VA space (it doesn't
> necessarily span the whole 64-bit space), a kernel reserved area of the VA
> space, is not in any pre-allocated range of the VA space (if regions are
> enabled) or an entry already exists at that location.

In the mmap code, I have to deal with splitting the start/end VMA and
removing any VMAs in the way.  I do this by making a 'detached' tree
that is dealt with later, then just overwriting the area with one
mas_store() operation.  Would something like that work for you?

> 
> > > 
> > > > > +	if (unlikely(ret))
> > > > > +		return ret;
> > > > > +
> > > > > +	va->mgr = mgr;
> > > > > +	va->region = reg;
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gpuva_insert);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuva_remove - remove a &drm_gpuva
> > > > > + * @va: the &drm_gpuva to remove
> > > > > + *
> > > > > + * This removes the given &va from the underlaying tree.
> > > > > + */
> > > > > +void
> > > > > +drm_gpuva_remove(struct drm_gpuva *va)
> > > > > +{
> > > > > +	MA_STATE(mas, &va->mgr->va_mt, va->va.addr, 0);
> > > > > +
> > > > > +	mas_erase(&mas);
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gpuva_remove);
> > > > > +
> > > > ...
> > > > 
> > > > > +/**
> > > > > + * drm_gpuva_find_first - find the first &drm_gpuva in the given range
> > > > > + * @mgr: the &drm_gpuva_manager to search in
> > > > > + * @addr: the &drm_gpuvas address
> > > > > + * @range: the &drm_gpuvas range
> > > > > + *
> > > > > + * Returns: the first &drm_gpuva within the given range
> > > > > + */
> > > > > +struct drm_gpuva *
> > > > > +drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
> > > > > +		     u64 addr, u64 range)
> > > > > +{
> > > > > +	MA_STATE(mas, &mgr->va_mt, addr, 0);
> > > > > +
> > > > > +	return mas_find(&mas, addr + range - 1);
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gpuva_find_first);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuva_find - find a &drm_gpuva
> > > > > + * @mgr: the &drm_gpuva_manager to search in
> > > > > + * @addr: the &drm_gpuvas address
> > > > > + * @range: the &drm_gpuvas range
> > > > > + *
> > > > > + * Returns: the &drm_gpuva at a given &addr and with a given &range
> > > > 
> > > > Note that mas_find() will continue upwards in the address space if there
> > > > isn't anything at @addr.  This means that &drm_gpuva may not be at
> > > > &addr.  If you want to check just at &addr, use mas_walk().
> > > 
> > > Good catch. drm_gpuva_find() should then either also check for 'va->va.addr
> > > == addr' as well or, alternatively, use mas_walk(). As above, any reason to
> > > prefer mas_walk()?

I think I missed this question last time..

Internally, mas_find() is just a mas_walk() on the first call, then
mas_next() for each call after that.  If, during the mas_walk(), there
is no value at addr, it immediately calls mas_next() to get a value to
return.  It will continue upwards until the limit is reached (addr +
range - 1 in your case).

So if you only want to know if there is something at addr, then it's
best to use mas_walk() and keep things a bit more efficient.  Then you
can check mas.last for your end value.

If you do want the first VMA within the range passed in, then mas_find()
is the function you want.

> > > 
> > > > 
> > > > > + */
> > > > > +struct drm_gpuva *
> > > > > +drm_gpuva_find(struct drm_gpuva_manager *mgr,
> > > > > +	       u64 addr, u64 range)
> > > > > +{
> > > > > +	struct drm_gpuva *va;
> > > > > +
> > > > > +	va = drm_gpuva_find_first(mgr, addr, range);
> > > > > +	if (!va)
> > > > > +		goto out;
> > > > > +
> > > > > +	if (va->va.range != range)
> > > > > +		goto out;
> > > > > +
> > > > > +	return va;
> > > > > +
> > > > > +out:
> > > > > +	return NULL;
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gpuva_find);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
> > > > > + * @mgr: the &drm_gpuva_manager to search in
> > > > > + * @start: the given GPU VA's start address
> > > > > + *
> > > > > + * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
> > > > > + *
> > > > > + * Note that if there is any free space between the GPU VA mappings no mapping
> > > > > + * is returned.
> > > > > + *
> > > > > + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
> > > > > + */
> > > > > +struct drm_gpuva *
> > > > > +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
> > > > 
> > > > find_prev() usually continues beyond 1 less than the address. I found
> > > > this name confusing.
> > > 
> > > Don't really get that, mind explaining?
> > 
> > When I ask for the previous one in a list or tree, I think the one
> > before.. but since you are limiting your search from start to start - 1,
> > you may as well walk to start - 1 and see if one exists.
> > 
> > Is that what you meant to do here?
> 
> Yes, I want to know whether there is a previous entry which ends right
> before the current entry, without a gap between the two.
> 
> > 
> > > 
> > > > You may as well use mas_walk(), it would be faster.
> > > 
> > > How would I use mas_walk() for that? If I understand it correctly,
> > > mas_walk() requires me to know that start address, which I don't know for
> > > the previous entry.
> > 
> > mas_walk() walks to the value you specify and returns the entry at that
> > address, not necessarily the start address, but any address in the
> > range.
> > 
> > If you have a tree and store A = [0x1000 - 0x2000] and set your maple
> > state to walk to 0x1500, mas_walk() will return A, and the maple state
> > will have mas.index = 0x1000 and mas.last = 0x2000.
> > 
> > You have set the maple state to start at "start" and called
> > mas_prev(&mas, start - 1).  start - 1 is the lower limit, so the
> > internal implementation will walk to start then go to the previous entry
> > until start - 1.. it will stop at start - 1 and return NULL if there
> > isn't one there.
> 
> Thanks for the clarification and all the other very helpful comments and
> explanations!
> 

Always glad to help.  The more users the tree has, the more I can see
where we may need to expand the interface to help others.

...