[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150226200556.GF11371@intel.com>
Date: Thu, 26 Feb 2015 22:05:56 +0200
From: Ville Syrjälä <ville.syrjala@...ux.intel.com>
To: Imre Deak <imre.deak@...el.com>
Cc: Bjørn Mork <bjorn@...k.no>,
linux-kernel@...r.kernel.org,
Daniel Vetter <daniel.vetter@...el.com>,
Jani Nikula <jani.nikula@...ux.intel.com>,
intel-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
stable@...r.kernel.org
Subject: Re: [PATCH] drm/i915: fix failure to power off after hibernate
On Thu, Feb 26, 2015 at 08:50:48PM +0200, Imre Deak wrote:
> On to, 2015-02-26 at 10:34 +0100, Bjørn Mork wrote:
> > Imre Deak <imre.deak@...el.com> writes:
> >
> > >> That patch fixes the problem, with only pci_set_power_state commented
> > >> out. Do you still want me to try with pci_disable_device() commented
> > >> out as well?
> > >
> > > No, but it would help if you could still try the two attached patch
> > > separately, without any of the previous workarounds. Based on the
> > > result, we'll follow up with a fix that adds for your specific platform
> > > either of the attached workarounds or simply avoids putting the device
> > > into D3 (corresponding to the patch you already tried).
> >
> > None of those patches made any difference. The laptop still hangs at
> > power-off.
> >
> > Not really surprising, is it? Previous testing shows that the hang
> > occurs at the pci_set_power_state(drm_dev->pdev, PCI_D3hot) call in the
> > poweroff_late hook. It is hard to see how replacing it by an attempt to
> > set D3cold, or adding any call after this point, could possibly change
> > anything. The system is stil hanging at the pci_set_power_state() call.
>
> Judging from the blinking LED, I assume that it's not
> pci_set_power_state() that hangs the machine, but the hang happens in
> BIOS code.
>
> > The generic pci-driver code will put the i915 device into PCI_D3hot for
> > you, won't it? Why do you need to duplicate that in the driver,
> > *knowing* that doing so breaks (at least some) systems?
>
> Letting the pci core put the device into D3 wouldn't get rid of the problem.
> It's putting the device into D3 in the first place what causes it.
>
> > I honestly don't think this "let's try some random code" is the proper
> > way to fix this bug (or any other bug for that matter). You need to
> > start understanding the code you write, and the first step is by
> > actually explaining the changes you make.
>
> We have a good understanding about the issue: the BIOS on your platform
> does something unexpected behind the back of the driver/kernel. In that
> sense the patches were not random, for instance the first one is based on
> an existing workaround for an issue that resembles quite a lot yours, see
> pci_pm_poweroff_noirq().
>
> > I also believe that you completely miss the fact that this bug has
> > survived a full release cycle before you became aware of it, and the
> > implications this has wrt other affected systems/users. I assume you
> > understand that my system isn't one-of-a-kind, This means that there are
> > other affected users with identical/similar systems. Now, if none of
> > those users reported the bug to you (we all know why: Linux kernel
> > development is currently limited by the available testing resources, NOT
> > by the available developer resources), then how do you know that there
> > aren't a number of other systems affected as well?
> >
> > Let me answer that for you: You don't.
> >
> > Which is why you must explain the mechanism triggering the bug, proving
> > that it is a chipset/system specific issue. Because that's the only way
> > you will *know* that you have solved the problem not only for me, but for
> > all affected users.
> >
> > IMHO, the only safe and sane solution at the moment is the revert patch
> > I posted. It's a simple fix, reverting back to the *known* working
> > state before this regression was introduced.
> >
> > Then you can start over from there, trying to implement this properly.
>
> The current way is the proper one that we want for the generic case. The issue
> on your platform is the exception, so working around that is a sensible choice.
>
> Attached is the proposed fix for this issue.
>
> --Imre
>
> From 5c23657bc168db12a1ba100ab65fabd305c89c8a Mon Sep 17 00:00:00 2001
> From: Imre Deak <imre.deak@...el.com>
> Date: Thu, 26 Feb 2015 18:38:53 +0200
> Subject: [PATCH] drm/i915: gm45: work around hang during hibernation
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Bjørn reported that his machine hang during hibernation and eventually
> bisected the problem to the following commit:
>
> commit da2bc1b9db3351addd293e5b82757efe1f77ed1d
> Author: Imre Deak <imre.deak@...el.com>
> Date: Thu Oct 23 19:23:26 2014 +0300
>
> drm/i915: add poweroff_late handler
>
> The problem seems to be that after the kernel puts the device into D3
> the BIOS still tries to access it, or otherwise assumes that it's in D0.
> This is clearly bogus, since ACPI mandates that devices are put into D3
> by the OSPM if they are not wake-up sources. In the future we want to
> unify more of the driver's runtime and system suspend paths, for example
> by skipping all the system suspend/hibernation hooks if the device is
> runtime suspended already. Accordingly for all other platforms the goal
> is still to properly power down the device during hibernation.
>
> Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html
> Reported-and-bisected-by: Bjørn Mork <bjorn@...k.no>
> Signed-off-by: Imre Deak <imre.deak@...el.com>
> ---
> drivers/gpu/drm/i915/i915_drv.c | 27 ++++++++++++++++++++++-----
> 1 file changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 4badb23..67d212b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -637,7 +637,7 @@ static int i915_drm_suspend(struct drm_device *dev)
> return 0;
> }
>
> -static int i915_drm_suspend_late(struct drm_device *drm_dev)
> +static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hibernation)
> {
> struct drm_i915_private *dev_priv = drm_dev->dev_private;
> int ret;
> @@ -651,7 +651,14 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev)
> }
>
> pci_disable_device(drm_dev->pdev);
> - pci_set_power_state(drm_dev->pdev, PCI_D3hot);
> + /*
> + * During hibernation on some GM45 platforms the BIOS may try to access
> + * the device even though it's already in D3 and hang the machine. So
> + * leave the device in D0 on those platforms and hope the BIOS will
> + * power down the device properly.
Please include the model of the known bad machine in this comment, to
help future archaeologists.
> + */
> + if (!(hibernation && IS_GM45(dev_priv)))
> + pci_set_power_state(drm_dev->pdev, PCI_D3hot);
>
> return 0;
> }
> @@ -677,7 +684,7 @@ int i915_suspend_legacy(struct drm_device *dev, pm_message_t state)
> if (error)
> return error;
>
> - return i915_drm_suspend_late(dev);
> + return i915_drm_suspend_late(dev, false);
> }
>
> static int i915_drm_resume(struct drm_device *dev)
> @@ -965,7 +972,17 @@ static int i915_pm_suspend_late(struct device *dev)
> if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> return 0;
>
> - return i915_drm_suspend_late(drm_dev);
> + return i915_drm_suspend_late(drm_dev, false);
> +}
> +
> +static int i915_pm_poweroff_late(struct device *dev)
> +{
> + struct drm_device *drm_dev = dev_to_i915(dev)->dev;
> +
> + if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> + return 0;
> +
> + return i915_drm_suspend_late(drm_dev, true);
> }
>
> static int i915_pm_resume_early(struct device *dev)
> @@ -1535,7 +1552,7 @@ static const struct dev_pm_ops i915_pm_ops = {
> .thaw_early = i915_pm_resume_early,
> .thaw = i915_pm_resume,
> .poweroff = i915_pm_suspend,
> - .poweroff_late = i915_pm_suspend_late,
> + .poweroff_late = i915_pm_poweroff_late,
> .restore_early = i915_pm_resume_early,
> .restore = i915_pm_resume,
>
> --
> 2.1.0
>
--
Ville Syrjälä
Intel OTC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists