linux-kernel - Re: [PATCH 2/2] drm/gpuvm: Add locking helpers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFM7gyGEJSVpQe1y@pollux>
Date: Thu, 19 Jun 2025 00:19:47 +0200
From: Danilo Krummrich <dakr@...hat.com>
To: Rob Clark <rob.clark@....qualcomm.com>
Cc: dri-devel@...ts.freedesktop.org, linux-arm-msm@...r.kernel.org,
	freedreno@...ts.freedesktop.org,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Maxime Ripard <mripard@...nel.org>,
	Thomas Zimmermann <tzimmermann@...e.de>,
	David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] drm/gpuvm: Add locking helpers

On Wed, Jun 18, 2025 at 02:56:37PM -0700, Rob Clark wrote:
> On Wed, Jun 18, 2025 at 2:23 PM Danilo Krummrich <dakr@...hat.com> wrote:
> >
> > On Tue, Jun 17, 2025 at 06:43:21AM -0700, Rob Clark wrote:
> > > On Tue, Jun 17, 2025 at 5:48 AM Rob Clark <rob.clark@....qualcomm.com> wrote:
> > > >
> > > > On Tue, Jun 17, 2025 at 2:51 AM Danilo Krummrich <dakr@...hat.com> wrote:
> > > > >
> > > > > On Mon, Jun 16, 2025 at 03:25:08PM -0700, Rob Clark wrote:
> > > > > > On Mon, Jun 16, 2025 at 2:39 PM Danilo Krummrich <dakr@...hat.com> wrote:
> > > > > > >
> > > > > > > On Sat, Jun 14, 2025 at 08:03:20AM -0700, Rob Clark wrote:
> > > > > > > > On Sat, Jun 14, 2025 at 3:39 AM Danilo Krummrich <dakr@...hat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Jun 13, 2025 at 04:57:03PM -0700, Rob Clark wrote:
> > > > > > > > > > For UNMAP/REMAP steps we could be needing to lock objects that are not
> > > > > > > > > > explicitly listed in the VM_BIND ioctl in order to tear-down unmapped
> > > > > > > > > > VAs.  These helpers handle locking/preparing the needed objects.
> > > > > > > > >
> > > > > > > > > Yes, that's a common use-case. I think drivers typically iterate through their
> > > > > > > > > drm_gpuva_ops to lock those objects.
> > > > > > > > >
> > > > > > > > > I had a look at you link [1] and it seems that you keep a list of ops as well by
> > > > > > > > > calling vm_op_enqueue() with a new struct msm_vm_op from the callbacks.
> > > > > > > > >
> > > > > > > > > Please note that for exactly this case there is the op_alloc callback in
> > > > > > > > > struct drm_gpuvm_ops, such that you can allocate a custom op type (i.e. struct
> > > > > > > > > msm_vm_op) that embedds a struct drm_gpuva_op.
> > > > > > > >
> > > > > > > > I did use drm_gpuvm_sm_xyz_ops_create() in an earlier iteration of my
> > > > > > > > VM_BIND series, but it wasn't quite what I was after.  I wanted to
> > > > > > > > apply the VM updates immediately to avoid issues with a later
> > > > > > > > map/unmap overlapping an earlier map, which
> > > > > > > > drm_gpuvm_sm_xyz_ops_create() doesn't really handle.  I'm not even
> > > > > > > > sure why this isn't a problem for other drivers unless userspace is
> > > > > > > > providing some guarantees.
> > > > > > >
> > > > > > > The drm_gpuva_ops are usually used in a pattern like this.
> > > > > > >
> > > > > > >         vm_bind {
> > > > > > >                 for_each_vm_bind_operation {
> > > > >                             drm_gpuvm_sm_xyz_ops_create();
> > > > > > >                         drm_gpuva_for_each_op {
> > > > > > >                                 // modify drm_gpuvm's interval tree
> > > > > > >                                 // pre-allocate memory
> > > > > > >                                 // lock and prepare objects
> > > > > > >                         }
> > > > > > >                 }
> > > > > > >
> > > > > > >                 drm_sched_entity_push_job();
> > > > > > >         }
> > > > > > >
> > > > > > >         run_job {
> > > > > > >                 for_each_vm_bind_operation {
> > > > > > >                         drm_gpuva_for_each_op {
> > > > > > >                                 // modify page tables
> > > > > > >                         }
> > > > > > >                 }
> > > > > > >         }
> > > > > > >
> > > > > > >         run_job {
> > > > > > >                 for_each_vm_bind_operation {
> > > > > > >                         drm_gpuva_for_each_op {
> > > > > > >                                 // free page table structures, if any
> > > > > > >                                 // free unused pre-allocated memory
> > > > > > >                         }
> > > > > > >                 }
> > > > > > >         }
> > > > > > >
> > > > > > > What did you do instead to get map/unmap overlapping? Even more interesting,
> > > > > > > what are you doing now?
> > > > > >
> > > > > > From what I can tell, the drivers using drm_gpva_for_each_op()/etc are
> > > > > > doing drm_gpuva_remove() while iterating the ops list..
> > > > > > drm_gpuvm_sm_xyz_ops_create() itself does not modify the VM.  So this
> > > > > > can only really work if you perform one MAP or UNMAP at a time.  Or at
> > > > > > least if you process the VM modifying part of the ops list before
> > > > > > proceeding to the next op.
> > > > >
> > > > > (Added the drm_gpuvm_sm_xyz_ops_create() step above.)
> > > > >
> > > > > I went through the code you posted [1] and conceptually you're implementing
> > > > > exactly the pattern I described above, i.e. you do:
> > > > >
> > > > >         vm_bind {
> > > > >                 for_each_vm_bind_operation {
> > > > >                         drm_gpuvm_sm_xyz_exec_lock();
> > > > >                 }
> > > > >
> > > > >                 for_each_vm_bind_operation {
> > > > >                         drm_gpuvm_sm_xyz() {
> > > > >                                 // modify drm_gpuvm's interval tree
> > > > >                                 // create custom ops
> > > > >                         }
> > > > >                 }
> > > > >
> > > > >                 drm_sched_entity_push_job();
> > > > >         }
> > > > >
> > > > >         run_job {
> > > > >                 for_each_vm_bind_operation {
> > > > >                         for_each_custom_op() {
> > > > >                                 // do stuff
> > > > >                         }
> > > > >                 }
> > > > >         }
> > > >
> > > > Close, but by the time we get to run_job there is just a single list
> > > > of ops covering all the vm_bind operations:
> > > >
> > > >         run_job {
> > > >                 for_each_custom_op() {
> > > >                         // do stuff
> > > >                 }
> > > >         }
> > > >
> > > > rather than a list of va ops per vm_bind op.
> > > >
> > > > > However, GPUVM intends to solve your use-case with the following, semantically
> > > > > identical, approach.
> > > > >
> > > > >         vm_bind {
> > > > >                 for_each_vm_bind_operation {
> > > > >                         drm_gpuvm_sm_xyz_ops_create();
> > > > >
> > > > >                         drm_gpuva_for_each_op {
> > > > >                                 // modify drm_gpuvm's interval tree
> > > > >                                 // lock and prepare objects (1)
> > > >
> > > > I currently decouple lock+pin from VM modification to avoid an error
> > > > path that leaves the VM partially modified.  Once you add this back
> > > > in, the va-ops approach isn't simpler, IMHO.
> > >
> > > Oh, actually scratch that.. using va-ops, it is not even possible to
> > > decouple locking/prepare from VM modifications.  So using
> > > DRM_EXEC_INTERRUPTIBLE_WAIT, for ex, with va-ops list would be an
> > > actively bad idea.
> >
> > Well, you would need to unwind the VM modifications. I think so far this hasn't
> > been an issue for drivers, since they have to unwind VM modifications for other
> > reasons anyways.
> 
> Only thing that can fail at this point are the drm_gpuvm_sm_xyz()
> calls[1].  Which should only be for small kmalloc's which should not
> fail.

But what happens *if* they fail? How do you handle this? If you don't unwind all
preceeding changes to the GPUVM's interval tree at this point your VM is broken.

Glancing at the code, it seems that you're allocating the GPUVA ops. And if that
fails you just return the error, leaving the VM in a broken state.

What we could do is to implement a helper that calculates the worst case number
of ops and pre-allocate them, but that's not exactly trivial.

drm_gpuvm_sm_{map,unmap}_exec_lock() only really makes sense if we can prevent
the need to ever unwind, so that's a precondition.

Alternatively, you can also decide to accept that your VM is dead if you can't
allocate the GPUVA ops, I guess. In this case you'd really have to shut it down
though, otherwise you likely risk faults in your PT management code.

> But all the "usual" errors (bad params from userspace,
> interrupted locking, etc) are dealt with before we begin to modify the
> VM.  If anything does fail after we start modifying the VM we mark the
> vm as unusable, similar to a gpu fault.
> 
> [1] ok, I should put some drm_gpuvm_range_valid() checks earlier in
> the ioctl, while parsing out and validating args/flags/etc.  I
> overlooked this.
> 
> > Do you never need to unwind for other reasons than locking dma_resv and
> > preparing GEM objects? Are you really sure there's nothing else in the critical
> > path?
> >
> > If there really isn't anything, I agree that those helpers have value and we
> > should add them. So, if we do so, please document in detail the conditions under
> > which drm_gpuvm_sm_{map,unmap}_exec_lock() can be called for multiple VM_BIND
> > ops *without* updating GPUVM's interval tree intermediately, including an
> > example.
> 
> Ok.. in the function headerdoc comment block?  Or did you have
> somewhere else in mind?

Yeah, in the function doc-comment.