lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 12 Feb 2017 10:43:56 -0500
From:   Woody Suwalski <terraluna977@...il.com>
To:     Pavel Machek <pavel@....cz>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>
Cc:     kernel list <linux-kernel@...r.kernel.org>, tglx@...utronix.de,
        mingo@...hat.com, hpa@...or.com
Subject: Re: 4.10-rc1: thinkpad x60: who ate my cpu?

Woody Suwalski wrote:
> Pavel Machek wrote:
>> On Sat 2017-01-14 12:30:54, Pavel Machek wrote:
>>> Hi!
>>>
>>> On Thu 2017-01-12 20:19:31, Woody Suwalski wrote:
>>>> Pavel Machek wrote:
>>>>> Hi!
>>>>>
>>>>>> I used to have two cpus, and Thinkpad X60 should have two cores, 
>>>>>> but I
>>>>>> only see one on 4.10-rc1. This machine went through many
>>>>>> suspend/resume cycles. When backups finish, I'll try -rc2.
>>>>> Whoever did it, he seems to have returned the cpu in -rc3. All seems
>>>>> to be good now.
>>>> Actually since you have mentioned - I have checked my x60 - same 
>>>> problem -
>>>> only one CPU. However I was running 4.8.13 with uptime 33 days, 
>>>> multiple
>>>> sleep/wake-ups.
>>>> Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the 
>>>> issue is
>>>> older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup
>>>> related...
>>> Hmm. So I seen two cores in -rc3 after boot. But it is quite well
>>> possible that -rc1 was ok just after boot, too, and problem happened
>>> sometime later (probably during suspend/resume cycles). Let me go back
>>> to -rc1 to check.
>> Indeed in -rc1 I see both CPUs after boot. So we have hard to
>> reproduce case where 4.8 to 4.10 kernels lose one of the cpu cores...
>>
>>
>>
> Managed to duplicate - but it took again a long time - I have an 
> uptime of 29 days.
> It must have happened in the last day, as I kept checking as often as 
> I remembered.
>
> The kernel is 4.8.17 EOL, installed almost a month ago.
> Platform ThinkPad x60,  Intel(R) Core(TM) Duo CPU      T2400  @ 1.83GHz
>
> In dmesg I see that it used to be when 2 CPUs were OK:
> [690409.476107] PM: noirq suspend of devices complete after 79.914 msecs
> [690409.476547] ACPI: Preparing to enter system sleep state S3
> [690409.780081] ACPI : EC: EC stopped
> [690409.780083] PM: Saving platform NVS memory
> [690409.780284] Disabling non-boot CPUs ...
> [690409.805284] smpboot: CPU 1 is now offline
> [690409.816464] ACPI: Low-level resume complete
> [690409.816464] ACPI : EC: EC started
> [690409.816464] PM: Restoring platform NVS memory
> [690409.816464] Enabling non-boot CPUs ...
> [690409.840574] x86: Booting SMP configuration:
> [690409.840576] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [690409.805271] Initializing CPU#1
> [690409.805271] Disabled fast string operations
> [690409.888252]  cache: parent cpu1 should not be sleeping
> [690409.920185] CPU1 is up
> [690409.922288] ACPI: Waking up from system sleep state S3
>
> Then the CPU1 failed to start:
>
> [691329.776108] PM: noirq suspend of devices complete after 79.941 msecs
> [691329.776550] ACPI: Preparing to enter system sleep state S3
> [691330.080081] ACPI : EC: EC stopped
> [691330.080083] PM: Saving platform NVS memory
> [691330.080284] Disabling non-boot CPUs ...
> [691330.105303] smpboot: CPU 1 is now offline
> [691330.116477] ACPI: Low-level resume complete
> [691330.116477] ACPI : EC: EC started
> [691330.116477] PM: Restoring platform NVS memory
> [691330.116477] Enabling non-boot CPUs ...
> [691330.140570] x86: Booting SMP configuration:
> [691330.140572] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [691340.140015] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1
> [691340.164445] Error taking CPU1 up: -5
> [691340.166309] ACPI: Waking up from system sleep state S3
>
> And now it is:
> [692517.868523] ACPI: Preparing to enter system sleep state S3
> [692518.172074] ACPI : EC: EC stopped
> [692518.172076] PM: Saving platform NVS memory
> [692518.172269] Disabling non-boot CPUs ...
> [692518.172269] ACPI: Low-level resume complete
> [692518.172269] ACPI : EC: EC started
> [692518.172269] PM: Restoring platform NVS memory
> [692518.172269] ACPI: Waking up from system sleep state S3
>
> Is there any test I could do on the CPU wakeup while in that state?
>
> Woody
>
Is there a way to kick the offline-CPU into operation from /sys level?

Powered by blists - more mailing lists