lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20200817162449.GC3221@jcrouse1-lnx.qualcomm.com>
Date:   Mon, 17 Aug 2020 10:24:49 -0600
From:   Jordan Crouse <jcrouse@...eaurora.org>
To:     Chris Wilson <chris@...is-wilson.co.uk>
Cc:     linux-arm-msm@...r.kernel.org,
        Gustavo Padovan <gustavo@...ovan.org>,
        linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
        linaro-mm-sig@...ts.linaro.org,
        Christian König <christian.koenig@....com>,
        linux-media@...r.kernel.org
Subject: Re: [RFC PATCH v1] dma-fence-array: Deal with sub-fences that are
 signaled late

On Thu, Aug 13, 2020 at 07:49:24AM +0100, Chris Wilson wrote:
> Quoting Jordan Crouse (2020-08-13 00:55:44)
> > This is an RFC because I'm still trying to grok the correct behavior.
> > 
> > Consider a dma_fence_array created two two fence and signal_on_any is true.
> > A reference to dma_fence_array is taken for each waiting fence.
> > 
> > When the client calls dma_fence_wait() only one of the fences is signaled.
> > The client returns successfully from the wait and puts it's reference to
> > the array fence but the array fence still remains because of the remaining
> > un-signaled fence.
> > 
> > Now consider that the unsignaled fence is signaled while the timeline is being
> > destroyed much later. The timeline destroy calls dma_fence_signal_locked(). The
> > following sequence occurs:
> > 
> > 1) dma_fence_array_cb_func is called
> > 
> > 2) array->num_pending is 0 (because it was set to 1 due to signal_on_any) so the
> > callback function calls dma_fence_put() instead of triggering the irq work
> > 
> > 3) The array fence is released which in turn puts the lingering fence which is
> > then released
> > 
> > 4) deadlock with the timeline
> 
> It's the same recursive lock as we previously resolved in sw_sync.c by
> removing the locking from timeline_fence_release().

Ah, yep. I'm working on a not-quite-ready-for-primetime version of a vulkan
timeline implementation for drm/msm and I was doing something similar to how
sw_sync used to work in the release function. Getting rid of the recursive lock
in the timeline seems a better solution than this. Thanks for taking the time
to respond.

Jordan

> -Chris

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ