[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1712131738020.1885@nanos>
Date: Wed, 13 Dec 2017 17:41:55 +0100 (CET)
From: Thomas Gleixner <tglx@...utronix.de>
To: Bjorn Helgaas <helgaas@...nel.org>
cc: Maarten Lankhorst <dev@...ankhorst.nl>,
Michal Hocko <mhocko@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Andy Lutomirski <luto@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
the arch/x86 maintainers <x86@...nel.org>,
Daniel Vetter <daniel.vetter@...el.com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
linux-pci@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3
On Wed, 13 Dec 2017, Bjorn Helgaas wrote:
> [+cc linux-pci, linux-pm]
>
> On Wed, Dec 13, 2017 at 04:57:56PM +0100, Thomas Gleixner wrote:
> > So I was finally able to figure out what the hell is going on:
> >
> > Suspend:
> >
> > - The device suspend code puts the graphics card into a power
> > state != PCI_D0.
> >
> > - Offline non boot CPUs
> >
> > - Break interrupt affinity. Allocate new vector on CPU 0, compose and
> > write MSI message which ends up in:
> >
> > __pci_write_msi_msg(entry, msg)
> > {
> > if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
> > /* Don't touch the hardware now */
> > } else {
> > ....
> > }
> > entry->msg = *msg;
> > }
> >
> > So because the device is not in PCI_D0 the message is not written. It's
> > written in the device resume path.
>
> I'm not a PM guru, but this ordering seems fragile. If we offline
> CPUs before re-targeting interrupts directed at those CPUs, aren't we
> always going to be at risk of sending interrupts to an offline CPU?
>
> Even if the device is now asleep and therefore should not generate an
> interrupt, it seems like there's a window when the device returns to
> PCI_D0 where it could generate an interrupt before we have a chance to
> update the MSI message.
Definitely. That was fragile forever but puzzles me is that I can't figure
out what now causes that spurious interrupt to surface out of the blue.
Thanks,
tglx
Powered by blists - more mailing lists