linux-kernel - Re: [PATCH 2/2] drm/gpuvm: Add locking helpers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACSVV03WboQp_A1bzQ+xpX5DDkfaoXmbTuo9RfZ9bMaVTqdU+A@mail.gmail.com>
Date: Mon, 16 Jun 2025 15:25:08 -0700
From: Rob Clark <rob.clark@....qualcomm.com>
To: Danilo Krummrich <dakr@...hat.com>
Cc: dri-devel@...ts.freedesktop.org, linux-arm-msm@...r.kernel.org,
        freedreno@...ts.freedesktop.org,
        Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        Maxime Ripard <mripard@...nel.org>,
        Thomas Zimmermann <tzimmermann@...e.de>,
        David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] drm/gpuvm: Add locking helpers

On Mon, Jun 16, 2025 at 2:39 PM Danilo Krummrich <dakr@...hat.com> wrote:
>
> On Sat, Jun 14, 2025 at 08:03:20AM -0700, Rob Clark wrote:
> > On Sat, Jun 14, 2025 at 3:39 AM Danilo Krummrich <dakr@...hat.com> wrote:
> > >
> > > On Fri, Jun 13, 2025 at 04:57:03PM -0700, Rob Clark wrote:
> > > > For UNMAP/REMAP steps we could be needing to lock objects that are not
> > > > explicitly listed in the VM_BIND ioctl in order to tear-down unmapped
> > > > VAs.  These helpers handle locking/preparing the needed objects.
> > >
> > > Yes, that's a common use-case. I think drivers typically iterate through their
> > > drm_gpuva_ops to lock those objects.
> > >
> > > I had a look at you link [1] and it seems that you keep a list of ops as well by
> > > calling vm_op_enqueue() with a new struct msm_vm_op from the callbacks.
> > >
> > > Please note that for exactly this case there is the op_alloc callback in
> > > struct drm_gpuvm_ops, such that you can allocate a custom op type (i.e. struct
> > > msm_vm_op) that embedds a struct drm_gpuva_op.
> >
> > I did use drm_gpuvm_sm_xyz_ops_create() in an earlier iteration of my
> > VM_BIND series, but it wasn't quite what I was after.  I wanted to
> > apply the VM updates immediately to avoid issues with a later
> > map/unmap overlapping an earlier map, which
> > drm_gpuvm_sm_xyz_ops_create() doesn't really handle.  I'm not even
> > sure why this isn't a problem for other drivers unless userspace is
> > providing some guarantees.
>
> The drm_gpuva_ops are usually used in a pattern like this.
>
>         vm_bind {
>                 for_each_vm_bind_operation {
>                         drm_gpuva_for_each_op {
>                                 // modify drm_gpuvm's interval tree
>                                 // pre-allocate memory
>                                 // lock and prepare objects
>                         }
>                 }
>
>                 drm_sched_entity_push_job();
>         }
>
>         run_job {
>                 for_each_vm_bind_operation {
>                         drm_gpuva_for_each_op {
>                                 // modify page tables
>                         }
>                 }
>         }
>
>         run_job {
>                 for_each_vm_bind_operation {
>                         drm_gpuva_for_each_op {
>                                 // free page table structures, if any
>                                 // free unused pre-allocated memory
>                         }
>                 }
>         }
>
> What did you do instead to get map/unmap overlapping? Even more interesting,
> what are you doing now?

>From what I can tell, the drivers using drm_gpva_for_each_op()/etc are
doing drm_gpuva_remove() while iterating the ops list..
drm_gpuvm_sm_xyz_ops_create() itself does not modify the VM.  So this
can only really work if you perform one MAP or UNMAP at a time.  Or at
least if you process the VM modifying part of the ops list before
proceeding to the next op.

>
> > Once I realized I only wanted to defer the
> > application of the pgtable changes, but keep all the
> > locking/allocation/etc in the synchronous part of the ioctl,
> > vm_op_enqueue() was the natural solution.
>
> But vm_op_enqueue() creates exactly this list of operations you would get from
> drm_gpuvm_sm_{map,unmap}_ops_create(), just manually, no?

Only if each job only has a single VM_BIND MAP or UNMAP or if you
process the ops immediately.

OTOH vm_op_enqueue() generates the list of pgtable updates to perform
for a list of MAP/UNMAP ioctl ops.  I guess it would be an equivalent
of combining drm_gpuvm_sm_xyz_ops_create() plus the actual driver
callbacks in a single pass.

BR,
-R

> <snip>
>
> > > > Note that these functions do not strictly require the VM changes to be
> > > > applied before the next drm_gpuvm_sm_map_lock()/_unmap_lock() call.  In
> > > > the case that VM changes from an earlier drm_gpuvm_sm_map()/_unmap()
> > > > call result in a differing sequence of steps when the VM changes are
> > > > actually applied, it will be the same set of GEM objects involved, so
> > > > the locking is still correct.
> > >
> > > I'm not sure about this part, how can we be sure that's the case?
> >
> > I could be not imaginative enough here, so it is certainly worth a
> > second opinion.  And why I explicitly called it out in the commit msg.
> > But my reasoning is that any new op in the second pass that actually
> > applies the VM updates which results from overlapping with a previous
> > update in the current VM_BIND will only involve GEM objects from that
> > earlier update, which are already locked.
>
> Yeah, it's probably fine, since, as you say, the only additional object can be
> the req_obj from the previous iteration.
>