[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250601232435.3507697-23-sashal@kernel.org>
Date: Sun, 1 Jun 2025 19:23:05 -0400
From: Sasha Levin <sashal@...nel.org>
To: patches@...ts.linux.dev,
stable@...r.kernel.org
Cc: Christoph Rudorff <chris@...orff.com>,
Lyude Paul <lyude@...hat.com>,
Sasha Levin <sashal@...nel.org>,
dakr@...nel.org,
airlied@...il.com,
simona@...ll.ch,
dri-devel@...ts.freedesktop.org,
nouveau@...ts.freedesktop.org,
linux-kernel@...r.kernel.org
Subject: [PATCH AUTOSEL 6.15 023/110] drm/nouveau: fix hibernate on disabled GPU
From: Christoph Rudorff <chris@...orff.com>
[ Upstream commit 4c4d9b7b6c6e676eca22585139aba5f03de74b90 ]
Hibernate bricks the machine if a discrete GPU was disabled via
echo IGD > /sys/kernel/debug/vgaswitcheroo/switch
The freeze and thaw handler lacks checking the GPU power state,
as suspend and resume do.
This patch add the checks and fix this issue.
Signed-off-by: Christoph Rudorff <chris@...orff.com>
Signed-off-by: Lyude Paul <lyude@...hat.com>
Link: https://lore.kernel.org/r/20250325-nouveau-fix-hibernate-v2-1-2bd5c13fb953@rudorff.com
Signed-off-by: Sasha Levin <sashal@...nel.org>
---
Based on my comprehensive analysis of both the commit changes and the
broader Linux kernel context, I can now provide my assessment: **YES**
This commit should be backported to stable kernel trees. Here's my
extensive analysis: ## **Critical Bug Fix Analysis** **1. User-Facing
System-Breaking Issue** - **Severity**: The commit fixes a machine-
bricking bug during hibernation - **Trigger**: Common user action (`echo
IGD > /sys/kernel/debug/vgaswitcheroo/switch`) - **Impact**: Complete
system failure requiring hard reset - **Affected Systems**: All
laptops/systems with discrete NVIDIA GPUs using nouveau driver **2.
Minimal, Surgical Code Changes** The fix is extremely contained and
safe: ```c + if (drm->dev->switch_power_state == DRM_SWITCH_POWER_OFF ||
+ drm->dev->switch_power_state == DRM_SWITCH_POWER_DYNAMIC_OFF) + return
0; ``` - **8 lines added** across 2 functions (`nouveau_pmops_freeze`
and `nouveau_pmops_thaw`) - **No complex logic** - simple early return
on disabled GPU - **No side effects** - only prevents hardware access
when GPU is already powered off - **Zero risk** - early return with
success code when hardware is inaccessible **3. Follows Established
Kernel Patterns** This change brings hibernation handlers in line with
existing patterns: - **AMD drivers**: Already implement identical checks
in suspend/resume - **Intel i915**: Uses same power state checking
methodology - **Nouveau suspend/resume**: Already had similar
protections - **Standard practice**: This pattern is used throughout the
DRM subsystem **4. Addresses Critical Gap in Power Management** -
**Missing parity**: Hibernation handlers lacked protection present in
suspend/resume - **Hardware protection**: Prevents accessing powered-off
GPU hardware - **VGA switcheroo integration**: Proper handling of multi-
GPU power states - **System stability**: Eliminates PCI bus hangs during
hibernation **5. Stable Tree Suitability Criteria Met** ✅ **Fixes
important user-facing bug**: Machine bricking during hibernation ✅
**Small and contained**: Only 8 lines, no architectural changes ✅
**Minimal regression risk**: Early returns are inherently safe ✅ **No
new features**: Pure bug fix for existing functionality ✅ **Clear cause
and effect**: Well-understood hardware access protection ✅ **Subsystem
confined**: Changes only affect nouveau hibernation path **6. Historical
Precedent** Similar commits have been backported (from provided
examples): - **drm/radeon hibernation fix**: "Cc:
stable@...r.kernel.org" for APU hibernation - **drm/amdgpu hibernation
fix**: "Cc: stable@...r.kernel.org" for s0ix hibernation **7. Risk
Assessment: Extremely Low** - **Cannot break working systems**: Only
affects disabled GPU hibernation path - **Early return pattern**:
Universally safe approach in kernel drivers - **Hardware protection**:
Prevents undefined behavior from hardware access - **Tested path**: Same
logic already proven in suspend/resume handlers ## **Conclusion** This
commit represents exactly the type of fix that stable trees are designed
for: a critical bug fix that prevents system failures, with minimal code
changes and virtually zero regression risk. The fix follows established
kernel patterns and addresses a serious gap in power management that
could leave users with completely unusable systems after hibernation
attempts. The commit should definitely be backported to prevent users
from experiencing machine-bricking hibernation failures on systems with
disabled discrete NVIDIA GPUs.
drivers/gpu/drm/nouveau/nouveau_drm.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index e154d08857c55..c69139701056d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -1079,6 +1079,10 @@ nouveau_pmops_freeze(struct device *dev)
{
struct nouveau_drm *drm = dev_get_drvdata(dev);
+ if (drm->dev->switch_power_state == DRM_SWITCH_POWER_OFF ||
+ drm->dev->switch_power_state == DRM_SWITCH_POWER_DYNAMIC_OFF)
+ return 0;
+
return nouveau_do_suspend(drm, false);
}
@@ -1087,6 +1091,10 @@ nouveau_pmops_thaw(struct device *dev)
{
struct nouveau_drm *drm = dev_get_drvdata(dev);
+ if (drm->dev->switch_power_state == DRM_SWITCH_POWER_OFF ||
+ drm->dev->switch_power_state == DRM_SWITCH_POWER_DYNAMIC_OFF)
+ return 0;
+
return nouveau_do_resume(drm, false);
}
--
2.39.5
Powered by blists - more mailing lists