lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <53CE93D4.3010204@amd.com> Date: Tue, 22 Jul 2014 18:39:48 +0200 From: Christian König <christian.koenig@....com> To: Daniel Vetter <daniel.vetter@...ll.ch>, Christian König <deathsimple@...afone.de> CC: Thomas Hellstrom <thellstrom@...are.com>, nouveau <nouveau@...ts.freedesktop.org>, LKML <linux-kernel@...r.kernel.org>, dri-devel <dri-devel@...ts.freedesktop.org>, "Deucher, Alexander" <alexander.deucher@....com>, Ben Skeggs <bskeggs@...hat.com> Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences > Maybe I've mixed things up a bit in my description. There is > fence_signal which the implementor/exporter of a fence must call when > the fence is completed. If the exporter has an ->enable_signaling > callback it can delay that call to fence_signal for as long as it > wishes as long as enable_signaling isn't called yet. But that's just > the optimization to not required irqs to be turned on all the time. > > The other function is fence_is_signaled, which is used by code that is > interested in the fence state, together with fence_wait if it wants to > block and not just wants to know the momentary fence state. All the > other functions (the stuff that adds callbacks and the various _locked > and other versions) are just for fancy special cases. Well that's rather bad, cause IRQs aren't reliable enough on Radeon HW for such a thing. Especially on Prime systems and Macs. That's why we have this fancy HZ/2 timeout on all fence wait operations to manually check if the fence is signaled or not. To guarantee that a fence is signaled after enable_signaling is called we would need to fire up a kernel thread which periodically calls fence->signaled. Christian. Am 22.07.2014 18:21, schrieb Daniel Vetter: > On Tue, Jul 22, 2014 at 5:59 PM, Christian König > <deathsimple@...afone.de> wrote: >> Am 22.07.2014 17:42, schrieb Daniel Vetter: >> >>> On Tue, Jul 22, 2014 at 5:35 PM, Christian König >>> <christian.koenig@....com> wrote: >>>> Drivers exporting fences need to provide a fence->signaled and a >>>> fence->wait >>>> function, everything else like fence->enable_signaling or calling >>>> fence_signaled() from the driver is optional. >>>> >>>> Drivers wanting to use exported fences don't call fence->signaled or >>>> fence->wait in atomic or interrupt context, and not with holding any >>>> global >>>> locking primitives (like mmap_sem etc...). Holding locking primitives >>>> local >>>> to the driver is ok, as long as they don't conflict with anything >>>> possible >>>> used by their own fence implementation. >>> Well that's almost what we have right now with the exception that >>> drivers are allowed (actually must for correctness when updating >>> fences) the ww_mutexes for dma-bufs (or other buffer objects). >> >> In this case sorry for so much noise. I really haven't looked in so much >> detail into anything but Maarten's Radeon patches. >> >> But how does that then work right now? My impression was that it's mandatory >> for drivers to call fence_signaled()? > Maybe I've mixed things up a bit in my description. There is > fence_signal which the implementor/exporter of a fence must call when > the fence is completed. If the exporter has an ->enable_signaling > callback it can delay that call to fence_signal for as long as it > wishes as long as enable_signaling isn't called yet. But that's just > the optimization to not required irqs to be turned on all the time. > > The other function is fence_is_signaled, which is used by code that is > interested in the fence state, together with fence_wait if it wants to > block and not just wants to know the momentary fence state. All the > other functions (the stuff that adds callbacks and the various _locked > and other versions) are just for fancy special cases. > >>> Locking >>> correctness is enforced with some extremely nasty lockdep annotations >>> + additional debugging infrastructure enabled with >>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. We really need to be able to hold >>> dma-buf ww_mutexes while updating fences or waiting for them. And >>> obviously for ->wait we need non-atomic context, not just >>> non-interrupt. >> >> Sounds mostly reasonable, but for holding the dma-buf ww_mutex, wouldn't be >> an RCU be more appropriate here? E.g. aren't we just interested that the >> current assigned fence at some point is signaled? > Yeah, as an optimization you can get the set of currently attached > fences to a dma-buf with just rcu. But if you update the set of fences > attached to a dma-buf (e.g. radeon blits the newly rendered frame to a > dma-buf exported by i915 for scanout on i915) then you need a write > lock on that buffer. Which is what the ww_mutex is for, to make sure > that you don't deadlock with i915 doing concurrent ops on the same > underlying buffer. > >> Something like grab ww_mutexes, grab a reference to the current fence >> object, release ww_mutex, wait for fence, release reference to the fence >> object. > Yeah, if the only thing you want to do is wait for fences, then the > rcu-protected fence ref grabbing + lockless waiting is all you need. > But e.g. in an execbuf you also need to update fences and maybe deep > down in the reservation code you notice that you need to evict some > stuff and so need to wait on some other guy to finish, and it's too > complicated to drop and reacquire all the locks. Or you simply need to > do a blocking wait on other gpus (because there's no direct hw sync > mechanism) and again dropping locks would needlessly complicate the > code. So I think we should allow this just to avoid too hairy/brittle > (and almost definitely little tested code) in drivers. > > Afaik this is also the same way ttm currently handles things wrt > buffer reservation and eviction. > >>> Agreed that any shared locks are out of the way (especially stuff like >>> dev->struct_mutex or other non-strictly driver-private stuff, i915 is >>> really bad here still). >> >> Yeah that's also an point I've wanted to note on Maartens patch. Radeon >> grabs the read side of it's exclusive semaphore while waiting for fences >> (because it assumes that the fence it waits for is a Radeon fence). >> >> Assuming that we need to wait in both directions with Prime (e.g. Intel >> driver needs to wait for Radeon to finish rendering and Radeon needs to wait >> for Intel to finish displaying), this might become a perfect example of >> locking inversion. > fence updates are atomic on a dma-buf, protected by ww_mutex. The neat > trick of ww_mutex is that they enforce a global ordering, so in your > scenario either i915 or radeon would be first and you can't deadlock. > There is no way to interleave anything even if you have lots of > buffers shared between i915/radeon. Wrt deadlocking it's exactly the > same guarantees as the magic ttm provides for just one driver with > concurrent command submission since it's the same idea. > >>> So from the core fence framework I think we already have exactly this, >>> and we only need to adjust the radeon implementation a bit to make it >>> less risky and invasive to the radeon driver logic. >> >> Agree. Well the biggest problem I see is that exclusive semaphore I need to >> take when anything calls into the driver. For the fence code I need to move >> that down into the fence->signaled handler, cause that now can be called >> from outside the driver. >> >> Maarten solved this by telling the driver in the lockup handler (where we >> grab the write side of the exclusive lock) that all interrupts are already >> enabled, so that fence->signaled hopefully wouldn't mess with the hardware >> at all. While this probably works, it just leaves me with a feeling that we >> are doing something wrong here. > I'm not versed on the details in readon, but on i915 we can attach a > memory location and cookie value to each fence and just do a memory > fetch to figure out whether the fence has passed or not. So no locking > needed at all. Of course the fence itself needs to lock a reference > onto that memory location, which is a neat piece of integration work > that we still need to tackle in some cases - there's conflicting patch > series all over this ;-) > > But like I've said fence->signaled is optional so you don't need this > necessarily, as long as radeon eventually calls fence_signaled once > the fence has completed. > -Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists