lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1712131738020.1885@nanos>
Date:   Wed, 13 Dec 2017 17:41:55 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Bjorn Helgaas <helgaas@...nel.org>
cc:     Maarten Lankhorst <dev@...ankhorst.nl>,
        Michal Hocko <mhocko@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Andy Lutomirski <luto@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Daniel Vetter <daniel.vetter@...el.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        linux-pci@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wed, 13 Dec 2017, Bjorn Helgaas wrote:
> [+cc linux-pci, linux-pm]
> 
> On Wed, Dec 13, 2017 at 04:57:56PM +0100, Thomas Gleixner wrote:
> > So I was finally able to figure out what the hell is going on:
> > 
> > Suspend:
> > 
> >  - The device suspend code puts the graphics card into a power
> >    state != PCI_D0.
> > 
> >  - Offline non boot CPUs
> > 
> >  - Break interrupt affinity. Allocate new vector on CPU 0, compose and
> >    write MSI message which ends up in:
> > 
> >    __pci_write_msi_msg(entry, msg)
> >    {
> > 	if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
> > 	   /* Don't touch the hardware now */
> > 	} else {
> > 	   ....
> > 	}
> > 	entry->msg = *msg;
> >    }
> >  
> >   So because the device is not in PCI_D0 the message is not written. It's
> >   written in the device resume path.
> 
> I'm not a PM guru, but this ordering seems fragile.  If we offline
> CPUs before re-targeting interrupts directed at those CPUs, aren't we
> always going to be at risk of sending interrupts to an offline CPU?
> 
> Even if the device is now asleep and therefore should not generate an
> interrupt, it seems like there's a window when the device returns to
> PCI_D0 where it could generate an interrupt before we have a chance to
> update the MSI message.

Definitely. That was fragile forever but puzzles me is that I can't figure
out what now causes that spurious interrupt to surface out of the blue.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ