lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 11 Feb 2017 18:48:27 -0500
From:   Woody Suwalski <terraluna977@...il.com>
To:     Pavel Machek <pavel@....cz>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>
Cc:     kernel list <linux-kernel@...r.kernel.org>, tglx@...utronix.de,
        mingo@...hat.com, hpa@...or.com
Subject: Re: 4.10-rc1: thinkpad x60: who ate my cpu?

Pavel Machek wrote:
> On Sat 2017-01-14 12:30:54, Pavel Machek wrote:
>> Hi!
>>
>> On Thu 2017-01-12 20:19:31, Woody Suwalski wrote:
>>> Pavel Machek wrote:
>>>> Hi!
>>>>
>>>>> I used to have two cpus, and Thinkpad X60 should have two cores, but I
>>>>> only see one on 4.10-rc1. This machine went through many
>>>>> suspend/resume cycles. When backups finish, I'll try -rc2.
>>>> Whoever did it, he seems to have returned the cpu in -rc3. All seems
>>>> to be good now.
>>> Actually since you have mentioned - I have checked my x60 - same problem -
>>> only one CPU. However I was running 4.8.13 with uptime 33 days, multiple
>>> sleep/wake-ups.
>>> Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the issue is
>>> older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup
>>> related...
>> Hmm. So I seen two cores in -rc3 after boot. But it is quite well
>> possible that -rc1 was ok just after boot, too, and problem happened
>> sometime later (probably during suspend/resume cycles). Let me go back
>> to -rc1 to check.
> Indeed in -rc1 I see both CPUs after boot. So we have hard to
> reproduce case where 4.8 to 4.10 kernels lose one of the cpu cores...
>
>
>
Managed to duplicate - but it took again a long time - I have an uptime 
of 29 days.
It must have happened in the last day, as I kept checking as often as I 
remembered.

The kernel is 4.8.17 EOL, installed almost a month ago.
Platform ThinkPad x60,  Intel(R) Core(TM) Duo CPU      T2400  @ 1.83GHz

In dmesg I see that it used to be when 2 CPUs were OK:
[690409.476107] PM: noirq suspend of devices complete after 79.914 msecs
[690409.476547] ACPI: Preparing to enter system sleep state S3
[690409.780081] ACPI : EC: EC stopped
[690409.780083] PM: Saving platform NVS memory
[690409.780284] Disabling non-boot CPUs ...
[690409.805284] smpboot: CPU 1 is now offline
[690409.816464] ACPI: Low-level resume complete
[690409.816464] ACPI : EC: EC started
[690409.816464] PM: Restoring platform NVS memory
[690409.816464] Enabling non-boot CPUs ...
[690409.840574] x86: Booting SMP configuration:
[690409.840576] smpboot: Booting Node 0 Processor 1 APIC 0x1
[690409.805271] Initializing CPU#1
[690409.805271] Disabled fast string operations
[690409.888252]  cache: parent cpu1 should not be sleeping
[690409.920185] CPU1 is up
[690409.922288] ACPI: Waking up from system sleep state S3

Then the CPU1 failed to start:

[691329.776108] PM: noirq suspend of devices complete after 79.941 msecs
[691329.776550] ACPI: Preparing to enter system sleep state S3
[691330.080081] ACPI : EC: EC stopped
[691330.080083] PM: Saving platform NVS memory
[691330.080284] Disabling non-boot CPUs ...
[691330.105303] smpboot: CPU 1 is now offline
[691330.116477] ACPI: Low-level resume complete
[691330.116477] ACPI : EC: EC started
[691330.116477] PM: Restoring platform NVS memory
[691330.116477] Enabling non-boot CPUs ...
[691330.140570] x86: Booting SMP configuration:
[691330.140572] smpboot: Booting Node 0 Processor 1 APIC 0x1
[691340.140015] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1
[691340.164445] Error taking CPU1 up: -5
[691340.166309] ACPI: Waking up from system sleep state S3

And now it is:
[692517.868523] ACPI: Preparing to enter system sleep state S3
[692518.172074] ACPI : EC: EC stopped
[692518.172076] PM: Saving platform NVS memory
[692518.172269] Disabling non-boot CPUs ...
[692518.172269] ACPI: Low-level resume complete
[692518.172269] ACPI : EC: EC started
[692518.172269] PM: Restoring platform NVS memory
[692518.172269] ACPI: Waking up from system sleep state S3

Is there any test I could do on the CPU wakeup while in that state?

Woody

Powered by blists - more mailing lists