lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1712131507160.1885@nanos>
Date:   Wed, 13 Dec 2017 16:57:56 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Maarten Lankhorst <dev@...ankhorst.nl>
cc:     Michal Hocko <mhocko@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Andy Lutomirski <luto@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Daniel Vetter <daniel.vetter@...el.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

So I was finally able to figure out what the hell is going on:

Suspend:

 - The device suspend code puts the graphics card into a power
   state != PCI_D0.

 - Offline non boot CPUs

 - Break interrupt affinity. Allocate new vector on CPU 0, compose and
   write MSI message which ends up in:

   __pci_write_msi_msg(entry, msg)
   {
	if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
	   /* Don't touch the hardware now */
	} else {
	   ....
	}
	entry->msg = *msg;
   }
 
  So because the device is not in PCI_D0 the message is not written. It's
  written in the device resume path.

Resume:
[  139.670446] ACPI: Low-level resume complete
[  139.670541] PM: Restoring platform NVS memory
[  139.672462] do_IRQ: 0.55 No irq handler for vector
[  139.672475] Enabling non-boot CPUs ...

So the spurious interrupt happens early and way before the device resume
code writes the new MSI message.

I checked the behaviour on 4.14. The MSI write is delayed there in the same
way, but there is no spurious interrupt. There is no interrupt coming in at
all _BEFORE_ the device is put out of PCI_D0.

And this has certainly nothing to do with the vector management changes,
but I can't figure yet what makes that spurious interrupt to be sent.

Any ideas welcome.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ