[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <439ba3f1-0b0b-7417-f306-c10935dbdb16@amd.com>
Date: Thu, 13 Aug 2020 08:52:09 +0200
From: Christian König <christian.koenig@....com>
To: Jordan Crouse <jcrouse@...eaurora.org>,
linux-arm-msm@...r.kernel.org
Cc: Gustavo Padovan <gustavo@...ovan.org>,
Sumit Semwal <sumit.semwal@...aro.org>,
dri-devel@...ts.freedesktop.org, linaro-mm-sig@...ts.linaro.org,
linux-kernel@...r.kernel.org, linux-media@...r.kernel.org
Subject: Re: [RFC PATCH v1] dma-fence-array: Deal with sub-fences that are
signaled late
Am 13.08.20 um 01:55 schrieb Jordan Crouse:
> This is an RFC because I'm still trying to grok the correct behavior.
>
> Consider a dma_fence_array created two two fence and signal_on_any is true.
> A reference to dma_fence_array is taken for each waiting fence.
Ok, that sounds like you seem to mix a couple of things up here.
A dma_fence_array takes the reference to the fences it contains on
creation. There is only one reference to the dma_fence_array even if it
contains N unsignaled fences.
What we do is to grab a reference to the array in
dma_fence_array_enable_signaling(), but this is because we are
registering the callback here.
> When the client calls dma_fence_wait() only one of the fences is signaled.
> The client returns successfully from the wait and puts it's reference to
> the array fence but the array fence still remains because of the remaining
> un-signaled fence.
If signaling was enabled then this is correct, because otherwise we
would crash when the other callbacks are called.
> Now consider that the unsignaled fence is signaled while the timeline is being
> destroyed much later. The timeline destroy calls dma_fence_signal_locked(). The
> following sequence occurs:
>
> 1) dma_fence_array_cb_func is called
>
> 2) array->num_pending is 0 (because it was set to 1 due to signal_on_any) so the
> callback function calls dma_fence_put() instead of triggering the irq work
>
> 3) The array fence is released which in turn puts the lingering fence which is
> then released
>
> 4) deadlock with the timeline
Why do we have a deadlock here? That doesn't seems to add up.
Christian.
>
> I think that we can fix this with the attached patch. Once the fence is
> signaled signaling it again in the irq worker shouldn't hurt anything. The only
> gotcha might be how the error is propagated - I wasn't quite sure the intent of
> clearing it only after getting to the irq worker.
>
> Signed-off-by: Jordan Crouse <jcrouse@...eaurora.org>
> ---
>
> drivers/dma-buf/dma-fence-array.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
> index d3fbd950be94..b8829b024255 100644
> --- a/drivers/dma-buf/dma-fence-array.c
> +++ b/drivers/dma-buf/dma-fence-array.c
> @@ -46,8 +46,6 @@ static void irq_dma_fence_array_work(struct irq_work *wrk)
> {
> struct dma_fence_array *array = container_of(wrk, typeof(*array), work);
>
> - dma_fence_array_clear_pending_error(array);
> -
> dma_fence_signal(&array->base);
> dma_fence_put(&array->base);
> }
> @@ -61,10 +59,10 @@ static void dma_fence_array_cb_func(struct dma_fence *f,
>
> dma_fence_array_set_pending_error(array, f->error);
>
> - if (atomic_dec_and_test(&array->num_pending))
> - irq_work_queue(&array->work);
> - else
> - dma_fence_put(&array->base);
> + if (!atomic_dec_and_test(&array->num_pending))
> + dma_fence_array_set_pending_error(array, f->error);
> +
> + irq_work_queue(&array->work);
> }
>
> static bool dma_fence_array_enable_signaling(struct dma_fence *fence)
Powered by blists - more mailing lists