linux-kernel - Re: [PATCH] drm/i915: fix failure to power off after hibernate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150226200556.GF11371@intel.com>
Date:	Thu, 26 Feb 2015 22:05:56 +0200
From:	Ville Syrjälä <ville.syrjala@...ux.intel.com>
To:	Imre Deak <imre.deak@...el.com>
Cc:	Bjørn Mork <bjorn@...k.no>,
	linux-kernel@...r.kernel.org,
	Daniel Vetter <daniel.vetter@...el.com>,
	Jani Nikula <jani.nikula@...ux.intel.com>,
	intel-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
	stable@...r.kernel.org
Subject: Re: [PATCH] drm/i915: fix failure to power off after hibernate

On Thu, Feb 26, 2015 at 08:50:48PM +0200, Imre Deak wrote:
> On to, 2015-02-26 at 10:34 +0100, Bjørn Mork wrote:
> > Imre Deak <imre.deak@...el.com> writes:
> >
> > >> That patch fixes the problem, with only pci_set_power_state commented
> > >> out.  Do you still want me to try with pci_disable_device() commented
> > >> out as well?
> > >
> > > No, but it would help if you could still try the two attached patch
> > > separately, without any of the previous workarounds. Based on the
> > > result, we'll follow up with a fix that adds for your specific platform
> > > either of the attached workarounds or simply avoids putting the device
> > > into D3 (corresponding to the patch you already tried).
> >
> > None of those patches made any difference.  The laptop still hangs at
> > power-off.
> >
> > Not really surprising, is it?  Previous testing shows that the hang
> > occurs at the pci_set_power_state(drm_dev->pdev, PCI_D3hot) call in the
> > poweroff_late hook.  It is hard to see how replacing it by an attempt to
> > set D3cold, or adding any call after this point, could possibly change
> > anything.  The system is stil hanging at the pci_set_power_state() call.
> 
> Judging from the blinking LED, I assume that it's not
> pci_set_power_state() that hangs the machine, but the hang happens in
> BIOS code.
> 
> > The generic pci-driver code will put the i915 device into PCI_D3hot for
> > you, won't it? Why do you need to duplicate that in the driver,
> > *knowing* that doing so breaks (at least some) systems?
> 
> Letting the pci core put the device into D3 wouldn't get rid of the problem.
> It's putting the device into D3 in the first place what causes it.
> 
> > I honestly don't think this "let's try some random code" is the proper
> > way to fix this bug (or any other bug for that matter).  You need to
> > start understanding the code you write, and the first step is by
> > actually explaining the changes you make.
> 
> We have a good understanding about the issue: the BIOS on your platform
> does something unexpected behind the back of the driver/kernel. In that
> sense the patches were not random, for instance the first one is based on
> an existing workaround for an issue that resembles quite a lot yours, see
> pci_pm_poweroff_noirq().
> 
> > I also believe that you completely miss the fact that this bug has
> > survived a full release cycle before you became aware of it, and the
> > implications this has wrt other affected systems/users.  I assume you
> > understand that my system isn't one-of-a-kind, This means that there are
> > other affected users with identical/similar systems.  Now, if none of
> > those users reported the bug to you (we all know why: Linux kernel
> > development is currently limited by the available testing resources, NOT
> > by the available developer resources), then how do you know that there
> > aren't a number of other systems affected as well?
> >
> > Let me answer that for you:  You don't.
> >
> > Which is why you must explain the mechanism triggering the bug, proving
> > that it is a chipset/system specific issue.  Because that's the only way
> > you will *know* that you have solved the problem not only for me, but for
> > all affected users.
> >
> > IMHO, the only safe and sane solution at the moment is the revert patch
> > I posted.  It's a simple fix, reverting back to the *known* working
> > state before this regression was introduced.
> >
> > Then you can start over from there, trying to implement this properly.
> 
> The current way is the proper one that we want for the generic case. The issue
> on your platform is the exception, so working around that is a sensible choice.
> 
> Attached is the proposed fix for this issue.
> 
> --Imre
> 

> From 5c23657bc168db12a1ba100ab65fabd305c89c8a Mon Sep 17 00:00:00 2001
> From: Imre Deak <imre.deak@...el.com>
> Date: Thu, 26 Feb 2015 18:38:53 +0200
> Subject: [PATCH] drm/i915: gm45: work around hang during hibernation
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Bjørn reported that his machine hang during hibernation and eventually
> bisected the problem to the following commit:
> 
> commit da2bc1b9db3351addd293e5b82757efe1f77ed1d
> Author: Imre Deak <imre.deak@...el.com>
> Date:   Thu Oct 23 19:23:26 2014 +0300
> 
>     drm/i915: add poweroff_late handler
> 
> The problem seems to be that after the kernel puts the device into D3
> the BIOS still tries to access it, or otherwise assumes that it's in D0.
> This is clearly bogus, since ACPI mandates that devices are put into D3
> by the OSPM if they are not wake-up sources. In the future we want to
> unify more of the driver's runtime and system suspend paths, for example
> by skipping all the system suspend/hibernation hooks if the device is
> runtime suspended already. Accordingly for all other platforms the goal
> is still to properly power down the device during hibernation.
> 
> Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html
> Reported-and-bisected-by: Bjørn Mork <bjorn@...k.no>
> Signed-off-by: Imre Deak <imre.deak@...el.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 27 ++++++++++++++++++++++-----
>  1 file changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 4badb23..67d212b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -637,7 +637,7 @@ static int i915_drm_suspend(struct drm_device *dev)
>  	return 0;
>  }
>  
> -static int i915_drm_suspend_late(struct drm_device *drm_dev)
> +static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hibernation)
>  {
>  	struct drm_i915_private *dev_priv = drm_dev->dev_private;
>  	int ret;
> @@ -651,7 +651,14 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev)
>  	}
>  
>  	pci_disable_device(drm_dev->pdev);
> -	pci_set_power_state(drm_dev->pdev, PCI_D3hot);
> +	/*
> +	 * During hibernation on some GM45 platforms the BIOS may try to access
> +	 * the device even though it's already in D3 and hang the machine. So
> +	 * leave the device in D0 on those platforms and hope the BIOS will
> +	 * power down the device properly.

Please include the model of the known bad machine in this comment, to
help future archaeologists.

> +	 */
> +	if (!(hibernation && IS_GM45(dev_priv)))
> +		pci_set_power_state(drm_dev->pdev, PCI_D3hot);
>  
>  	return 0;
>  }
> @@ -677,7 +684,7 @@ int i915_suspend_legacy(struct drm_device *dev, pm_message_t state)
>  	if (error)
>  		return error;
>  
> -	return i915_drm_suspend_late(dev);
> +	return i915_drm_suspend_late(dev, false);
>  }
>  
>  static int i915_drm_resume(struct drm_device *dev)
> @@ -965,7 +972,17 @@ static int i915_pm_suspend_late(struct device *dev)
>  	if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF)
>  		return 0;
>  
> -	return i915_drm_suspend_late(drm_dev);
> +	return i915_drm_suspend_late(drm_dev, false);
> +}
> +
> +static int i915_pm_poweroff_late(struct device *dev)
> +{
> +	struct drm_device *drm_dev = dev_to_i915(dev)->dev;
> +
> +	if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> +		return 0;
> +
> +	return i915_drm_suspend_late(drm_dev, true);
>  }
>  
>  static int i915_pm_resume_early(struct device *dev)
> @@ -1535,7 +1552,7 @@ static const struct dev_pm_ops i915_pm_ops = {
>  	.thaw_early = i915_pm_resume_early,
>  	.thaw = i915_pm_resume,
>  	.poweroff = i915_pm_suspend,
> -	.poweroff_late = i915_pm_suspend_late,
> +	.poweroff_late = i915_pm_poweroff_late,
>  	.restore_early = i915_pm_resume_early,
>  	.restore = i915_pm_resume,
>  
> -- 
> 2.1.0
> 


-- 
Ville Syrjälä
Intel OTC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/