linux-kernel - [PATCH v2] drm/amdgpu: Disable RPM helpers while reprobing connectors on resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 18 Jul 2016 11:41:37 -0400
From:	Lyude <cpaul@...hat.com>
To:	amd-gfx@...ts.freedesktop.org
Cc:	Lyude <cpaul@...hat.com>, stable@...r.kernel.org,
	Alex Deucher <alexdeucher@...il.com>,
	Alex Deucher <alexander.deucher@....com>,
	Christian König <christian.koenig@....com>,
	David Airlie <airlied@...ux.ie>,
	Chunming Zhou <David1.Zhou@....com>,
	Jammy Zhou <Jammy.Zhou@....com>, Monk Liu <Monk.Liu@....com>,
	Flora Cui <Flora.Cui@....com>,
	Tom St Denis <tom.stdenis@....com>,
	dri-devel@...ts.freedesktop.org (open list:RADEON and AMDGPU DRM
	DRIVERS), linux-kernel@...r.kernel.org (open list)
Subject: [PATCH v2] drm/amdgpu: Disable RPM helpers while reprobing connectors on resume

Just about all of amdgpu's connector probing functions try to acquire
runtime PM refs. If we try to do this in the context of
amdgpu_resume_kms by calling drm_helper_hpd_irq_event(), we end up
deadlocking the system.

Since we're guaranteed to be holding the spinlock for RPM in
amdgpu_resume_kms, and we already know the GPU is in working order, we
need to prevent the RPM helpers from trying to run during the initial
connector reprobe on resume.

There's a couple of solutions I've explored for fixing this, but this
one by far seems to be the simplest and most reliable (plus I'm pretty
sure that's what disable_depth is there for anyway).

Reproduction recipe:
  - Get any laptop dual GPUs using PRIME
  - Make sure runtime PM is enabled for amdgpu
  - Boot the machine
  - If the machine managed to boot without hanging, switch out of X to
    another VT. This should definitely cause X to hang infinitely.

Changes since v1:
  - add appropriate #ifdef checks for CONFIG_PM. This is not very
    useful, but it appears some kernel test suites test compiling amdgpu
    with CONFIG_PM disabled, which results in this patch breaking the builds
    if we don't include this #ifdef

Cc: stable@...r.kernel.org
Cc: Alex Deucher <alexdeucher@...il.com>
Signed-off-by: Lyude <cpaul@...hat.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6e92008..b7f5650 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1841,7 +1841,23 @@ int amdgpu_resume_kms(struct drm_device *dev, bool resume, bool fbcon)
 	}
 
 	drm_kms_helper_poll_enable(dev);
+
+	/*
+	 * Most of the connector probing functions try to acquire runtime pm
+	 * refs to ensure that the GPU is powered on when connector polling is
+	 * performed. Since we're calling this from a runtime PM callback,
+	 * trying to acquire rpm refs will cause us to deadlock.
+	 *
+	 * Since we're guaranteed to be holding the rpm lock, it's safe to
+	 * temporarily disable the rpm helpers so this doesn't deadlock us.
+	 */
+#ifdef CONFIG_PM
+	dev->dev->power.disable_depth++;
+#endif
 	drm_helper_hpd_irq_event(dev);
+#ifdef CONFIG_PM
+	dev->dev->power.disable_depth--;
+#endif
 
 	if (fbcon) {
 		amdgpu_fbdev_set_suspend(adev, 0);
-- 
2.7.4