lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180716073444.276938236@linuxfoundation.org>
Date:   Mon, 16 Jul 2018 09:34:36 +0200
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     linux-kernel@...r.kernel.org
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        stable@...r.kernel.org, Russell King <linux@...linux.org.uk>,
        Lucas Stach <l.stach@...gutronix.de>,
        Eric Anholt <eric@...olt.net>
Subject: [PATCH 4.17 07/67] drm/etnaviv: bring back progress check in job timeout handler

4.17-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lucas Stach <l.stach@...gutronix.de>

commit 2c83a726d6fbb5d130d8f2edd82a258adb675ac3 upstream.

When the hangcheck handler was replaced by the DRM scheduler timeout
handling we dropped the forward progress check, as this might allow
clients to hog the GPU for a long time with a big job.

It turns out that even reasonably well behaved clients like the
Armada Xorg driver occasionally trip over the 500ms timeout. Bring
back the forward progress check to get rid of the userspace regression.

We would still like to fix userspace to submit smaller batches
if possible, but that is for another day.

Cc: <stable@...r.kernel.org>
Fixes: 6d7a20c07760 (drm/etnaviv: replace hangcheck with scheduler timeout)
Reported-by: Russell King <linux@...linux.org.uk>
Signed-off-by: Lucas Stach <l.stach@...gutronix.de>
Reviewed-by: Eric Anholt <eric@...olt.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>

---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.h   |    3 +++
 drivers/gpu/drm/etnaviv/etnaviv_sched.c |   24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h
@@ -142,6 +142,9 @@ struct etnaviv_gpu {
 	struct work_struct sync_point_work;
 	int sync_point_event;
 
+	/* hang detection */
+	u32 hangcheck_dma_addr;
+
 	void __iomem *mmio;
 	int irq;
 
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -21,6 +21,7 @@
 #include "etnaviv_gem.h"
 #include "etnaviv_gpu.h"
 #include "etnaviv_sched.h"
+#include "state.xml.h"
 
 static int etnaviv_job_hang_limit = 0;
 module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444);
@@ -96,6 +97,29 @@ static void etnaviv_sched_timedout_job(s
 {
 	struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
 	struct etnaviv_gpu *gpu = submit->gpu;
+	u32 dma_addr;
+	int change;
+
+	/*
+	 * If the GPU managed to complete this jobs fence, the timout is
+	 * spurious. Bail out.
+	 */
+	if (fence_completed(gpu, submit->out_fence->seqno))
+		return;
+
+	/*
+	 * If the GPU is still making forward progress on the front-end (which
+	 * should never loop) we shift out the timeout to give it a chance to
+	 * finish the job.
+	 */
+	dma_addr = gpu_read(gpu, VIVS_FE_DMA_ADDRESS);
+	change = dma_addr - gpu->hangcheck_dma_addr;
+	if (change < 0 || change > 16) {
+		gpu->hangcheck_dma_addr = dma_addr;
+		schedule_delayed_work(&sched_job->work_tdr,
+				      sched_job->sched->timeout);
+		return;
+	}
 
 	/* block scheduler */
 	kthread_park(gpu->sched.thread);


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ