linux-kernel - Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200619151551.GP6578@ziepe.ca>
Date:   Fri, 19 Jun 2020 12:15:51 -0300
From:   Jason Gunthorpe <jgg@...pe.ca>
To:     Daniel Vetter <daniel@...ll.ch>
Cc:     Thomas Hellström (Intel) 
        <thomas_os@...pmail.org>,
        DRI Development <dri-devel@...ts.freedesktop.org>,
        linux-rdma <linux-rdma@...r.kernel.org>,
        Intel Graphics Development <intel-gfx@...ts.freedesktop.org>,
        Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>,
        amd-gfx list <amd-gfx@...ts.freedesktop.org>,
        "moderated list:DMA BUFFER SHARING FRAMEWORK" 
        <linaro-mm-sig@...ts.linaro.org>,
        Thomas Hellstrom <thomas.hellstrom@...el.com>,
        Daniel Vetter <daniel.vetter@...el.com>,
        "open list:DMA BUFFER SHARING FRAMEWORK" 
        <linux-media@...r.kernel.org>,
        Christian König <christian.koenig@....com>,
        Mika Kuoppala <mika.kuoppala@...el.com>
Subject: Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep
 annotations

On Fri, Jun 19, 2020 at 05:06:04PM +0200, Daniel Vetter wrote:
> On Fri, Jun 19, 2020 at 1:39 PM Jason Gunthorpe <jgg@...pe.ca> wrote:
> >
> > On Fri, Jun 19, 2020 at 09:22:09AM +0200, Daniel Vetter wrote:
> > > > As I've understood GPU that means you need to show that the commands
> > > > associated with the buffer have completed. This is all local stuff
> > > > within the driver, right? Why use fence (other than it already exists)
> > >
> > > Because that's the end-of-dma thing. And it's cross-driver for the
> > > above reasons, e.g.
> > > - device A renders some stuff. Userspace gets dma_fence A out of that
> > > (well sync_file or one of the other uapi interfaces, but you get the
> > > idea)
> > > - userspace (across process or just different driver) issues more
> > > rendering for device B, which depends upon the rendering done on
> > > device A. So dma_fence A is an dependency and will block this dma
> > > operation. Userspace (and the kernel) gets dma_fence B out of this
> > > - because unfortunate reasons, the same rendering on device B also
> > > needs a userptr buffer, which means that dma_fence B is also the one
> > > that the mmu_range_notifier needs to wait on before it can tell core
> > > mm that it can go ahead and release those pages
> >
> > I was afraid you'd say this - this is complete madness for other DMA
> > devices to borrow the notifier hook of the first device!
> 
> The first device might not even have a notifier. This is the 2nd
> device, waiting on a dma_fence of its own, but which happens to be
> queued up as a dma operation behind something else.
> 
> > What if the first device is a page faulting device and doesn't call
> > dma_fence??
> 
> Not sure what you mean with this ... even if it does page-faulting for
> some other reasons, it'll emit a dma_fence which the 2nd device can
> consume as a dependency.

At some point the pages under the buffer have to be either pinned
or protected by mmu notifier. So each and every single device doing
DMA to these pages must either pin, or use mmu notifier.

Driver A should never 'borrow' a notifier from B

If each driver controls its own lifetime of the buffers, why can't the
driver locally wait for its device to finish?

Can't the GPUs cancel work that is waiting on a DMA fence? Ie if
Driver A detects that work completed and wants to trigger a DMA fence,
but it now knows the buffer is invalidated, can't it tell driver B to
give up?

> The problem is that there's piles of other dependencies for a dma job.
> GPU doesn't just consume a single buffer each time, it consumes entire
> lists of buffers and mixes them all up in funny ways. Some of these
> buffers are userptr, entirely local to the device. Other buffers are
> just normal device driver allocations (and managed with some shrinker
> to keep them in check). And then there's the actually shared dma-buf
> with other devices. The trouble is that they're all bundled up
> together.

But why does this matter? Does the GPU itself consume some work and
then stall internally waiting for an external DMA fence?

Otherwise I would expect this dependency chain should be breakable by
aborting work waiting on fences upon invalidation (without stalling)

> > Do not need to wait on dma_fence in notifiers.
> 
> Maybe :-) The goal of this series is more to document current rules
> and make them more consistent. Fixing them if we don't like them might
> be a follow-up task, but that would likely be a pile more work. First
> we need to know what the exact shape of the problem even is.

Fair enough

Jason