lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200812235544.2289895-1-jcrouse@codeaurora.org>
Date:   Wed, 12 Aug 2020 17:55:44 -0600
From:   Jordan Crouse <jcrouse@...eaurora.org>
To:     linux-arm-msm@...r.kernel.org
Cc:     Christian König <christian.koenig@....com>,
        Gustavo Padovan <gustavo@...ovan.org>,
        Sumit Semwal <sumit.semwal@...aro.org>,
        dri-devel@...ts.freedesktop.org, linaro-mm-sig@...ts.linaro.org,
        linux-kernel@...r.kernel.org, linux-media@...r.kernel.org
Subject: [RFC PATCH v1] dma-fence-array: Deal with sub-fences that are signaled late

This is an RFC because I'm still trying to grok the correct behavior.

Consider a dma_fence_array created two two fence and signal_on_any is true.
A reference to dma_fence_array is taken for each waiting fence.

When the client calls dma_fence_wait() only one of the fences is signaled.
The client returns successfully from the wait and puts it's reference to
the array fence but the array fence still remains because of the remaining
un-signaled fence.

Now consider that the unsignaled fence is signaled while the timeline is being
destroyed much later. The timeline destroy calls dma_fence_signal_locked(). The
following sequence occurs:

1) dma_fence_array_cb_func is called

2) array->num_pending is 0 (because it was set to 1 due to signal_on_any) so the
callback function calls dma_fence_put() instead of triggering the irq work

3) The array fence is released which in turn puts the lingering fence which is
then released

4) deadlock with the timeline

I think that we can fix this with the attached patch. Once the fence is
signaled signaling it again in the irq worker shouldn't hurt anything. The only
gotcha might be how the error is propagated - I wasn't quite sure the intent of
clearing it only after getting to the irq worker.

Signed-off-by: Jordan Crouse <jcrouse@...eaurora.org>
---

 drivers/dma-buf/dma-fence-array.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index d3fbd950be94..b8829b024255 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -46,8 +46,6 @@ static void irq_dma_fence_array_work(struct irq_work *wrk)
 {
 	struct dma_fence_array *array = container_of(wrk, typeof(*array), work);
 
-	dma_fence_array_clear_pending_error(array);
-
 	dma_fence_signal(&array->base);
 	dma_fence_put(&array->base);
 }
@@ -61,10 +59,10 @@ static void dma_fence_array_cb_func(struct dma_fence *f,
 
 	dma_fence_array_set_pending_error(array, f->error);
 
-	if (atomic_dec_and_test(&array->num_pending))
-		irq_work_queue(&array->work);
-	else
-		dma_fence_put(&array->base);
+	if (!atomic_dec_and_test(&array->num_pending))
+		dma_fence_array_set_pending_error(array, f->error);
+
+	irq_work_queue(&array->work);
 }
 
 static bool dma_fence_array_enable_signaling(struct dma_fence *fence)
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ