lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6368800.kWPUrNViPU@aspire.rjw.lan>
Date:   Wed, 06 Dec 2017 15:04:43 +0100
From:   "Rafael J. Wysocki" <rjw@...ysocki.net>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wednesday, December 6, 2017 1:23:34 PM CET Thomas Gleixner wrote:
> On Wed, 6 Dec 2017, Michal Hocko wrote:
> > merging tip/x86/urgent on top of your tree fixed this problem for me,
> > but I am seeing something else
> > [  131.711412] ACPI: Preparing to enter system sleep state S3
> > [  131.755328] ACPI: EC: event blocked
> > [  131.755328] ACPI: EC: EC stopped
> > [  131.755328] PM: Saving platform NVS memory
> > [  131.755344] Disabling non-boot CPUs ...
> > [  131.779330] IRQ 124: no longer affine to CPU1
> > [  131.780334] smpboot: CPU 1 is now offline
> > [  131.804465] smpboot: CPU 2 is now offline
> > [  131.827291] IRQ 122: no longer affine to CPU3
> > [  131.827292] IRQ 123: no longer affine to CPU3
> > [  131.828293] smpboot: CPU 3 is now offline
> > [  131.830991] ACPI: Low-level resume complete
> > [  131.831092] ACPI: EC: EC started
> > [  131.831093] PM: Restoring platform NVS memory
> > [  131.831864] do_IRQ: 0.55 No irq handler for vector
> 
> Hmm, that's really odd.
> 
> > [  131.831884] Enabling non-boot CPUs ...
> > [  131.831909] x86: Booting SMP configuration:
> > [  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > [  131.832913]  cache: parent cpu1 should not be sleeping
> 
> This is an old one. 
> 
> > [  131.833058] CPU1 is up
> > [  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> > [  131.833864]  cache: parent cpu2 should not be sleeping
> > [  131.833983] CPU2 is up
> > [  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> > [  131.834776]  cache: parent cpu3 should not be sleeping
> > [  131.834923] CPU3 is up
> > 
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
> 
> I assume it's due to the affinity break, just that we don't know right now
> on which CPU that do_IRQ() message triggered. I assume it's CPU0 because
> the others are offline already, but ....

This is resume from S3, so the firmware might do something odd to the other
CPUs, but in case it didn't (which is quite likely or we would have seen more
of these messages), they are offline and in mwait_play_dead(), so IMO it is
safe to assume that this was CPU0.

And this appears to have happened at the atch_suspend_enable_irqs() time,
which is just local_irq_enable() on x86 running on CPU0.

> I'll think about it how we can figure out what's going on.

It looks like an interrupt that have triggered right after we've enabled
interrupts on the boot CPU.

Thanks,
Rafael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ