[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200619094309.GT20149@phenom.ffwll.local>
Date: Fri, 19 Jun 2020 11:43:09 +0200
From: Daniel Vetter <daniel@...ll.ch>
To: Chris Wilson <chris@...is-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@...ll.ch>,
amd-gfx mailing list <amd-gfx@...ts.freedesktop.org>,
linux-rdma <linux-rdma@...r.kernel.org>,
Intel Graphics Development <intel-gfx@...ts.freedesktop.org>,
LKML <linux-kernel@...r.kernel.org>,
DRI Development <dri-devel@...ts.freedesktop.org>,
"moderated list:DMA BUFFER SHARING FRAMEWORK"
<linaro-mm-sig@...ts.linaro.org>,
Thomas Hellstrom <thomas.hellstrom@...el.com>,
Daniel Vetter <daniel.vetter@...el.com>,
Mika Kuoppala <mika.kuoppala@...el.com>,
Christian König <christian.koenig@....com>,
Linux Media Mailing List <linux-media@...r.kernel.org>
Subject: Re: [Intel-gfx] [PATCH 03/18] dma-fence: basic lockdep annotations
On Fri, Jun 19, 2020 at 10:13:35AM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2020-06-19 09:51:59)
> > On Fri, Jun 19, 2020 at 10:25 AM Chris Wilson <chris@...is-wilson.co.uk> wrote:
> > > Forcing a generic primitive to always be part of the same global map is
> > > horrible.
> >
> > And no concrete example or reason for why that's not possible.
> > Because frankly it's not horrible, this is what upstream is all about:
> > Shared concepts, shared contracts, shared code.
> >
> > The proposed patches might very well encode the wrong contract, that's
> > all up for discussion. But fundamentally questioning that we need one
> > is missing what upstream is all about.
>
> Then I have not clearly communicated, as my opinion is not that
> validation is worthless, but that the implementation is enshrining a
> global property on a low level primitive that prevents it from being
> used elsewhere. And I want to replace completion [chains] with fences, and
> bio with fences, and closures with fences, and what other equivalencies
> there are in the kernel. The fence is as central a locking construct as
> struct completion and deserves to be a foundational primitive provided
> by kernel/ used throughout all drivers for discrete problem domains.
>
> This is narrowing dma_fence whereby adding
> struct lockdep_map *dma_fence::wait_map
> and annotating linkage, allows you to continue to specify that all
> dma_fence used for a particular purpose must follow common rules,
> without restricting the primitive for uses outside of this scope.
Somewhere else in this thread I had discussions with Jason Gunthorpe about
this topic. It might maybe change somewhat depending upon exact rules, but
his take is very much "I don't want dma_fence in rdma". Or pretty close to
that at least.
Similar discussions with habanalabs, they're using dma_fence internally
without any of the uapi. Discussion there has also now concluded that it's
best if they remove them, and simply switch over to a wait_queue or
completion like every other driver does.
The next round of the patches already have a paragraph to at least
somewhat limit how non-gpu drivers use dma_fence. And I guess actual
consensus might be pointing even more strongly at dma_fence being solely
something for gpus and closely related subsystem (maybe media) for syncing
dma-buf access.
So dma_fence as general replacement for completion chains I think just
wont happen.
What might make sense is if e.g. the lockdep annotations could be reused,
at least in design, for wait_queue or completion or anything else
really. I do think that has a fair chance compared to the automagic
cross-release annotations approach, which relied way too heavily on
guessing where barriers are. My experience from just a bit of playing
around with these patches here and discussing them with other driver
maintainers is that accurately deciding where critical sections start and
end is a job for humans only. And if you get it wrong, you will have a
false positive.
And you're indeed correct that if we'd do annotations for completions and
wait queues, then that would need to have a class per semantically
equivalent user, like we have lockdep classes for mutexes, not just one
overall.
But dma_fence otoh is something very specific, which comes with very
specific rules attached - it's not a generic wait_queue at all. Originally
it did start out as one even, but it is a very specialized wait_queue.
So there's imo two cases:
- Your completion is entirely orthogonal of dma_fences, and can never ever
block a dma_fence. Don't use dma_fence for this, and no problem. It's
just another wait_queue somewhere.
- Your completion can eventually, maybe through lots of convolutions and
depdencies, block a dma_fence. In that case full dma_fence rules apply,
and the only thing you can do with a custom annotation is make the rules
even stricter. E.g. if a sub-timeline in the scheduler isn't allowed to
take certain scheduler locks. But the userspace visible/published fence
do take them, maybe as part of command submission or retirement.
Entirely hypotethical, no idea any driver actually needs this.
Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Powered by blists - more mailing lists