lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <471D3C26.3090602@googlemail.com>
Date:	Tue, 23 Oct 2007 02:11:18 +0200
From:	Gabriel C <nix.or.die@...glemail.com>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
CC:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-pm@...ts.linux-foundation.org, linux-acpi@...r.kernel.org
Subject: Re: Resume problems

Gabriel C wrote:
> Rafael J. Wysocki wrote:
>> On Tuesday, 23 October 2007 01:00, Gabriel C wrote:
>>> Rafael J. Wysocki wrote:
>>>> On Monday, 22 October 2007 18:15, Gabriel C wrote:
>>>>> Hi all ,
>>>>>
>>>>> I'm running current git + aic7xxx suspend patch from  http://bugzilla.kernel.org/show_bug.cgi?id=3062
>>>>> on a Dell Precision WorkStation 530 MT SMP box ( HT enabled ).
>>>>>
>>>>> Suspend works fine but on resume I have some problems. 
>>>>> All CPU's but boot CPU won't come back , everything else seems fine.
>>>> Can you please try to disable HT and suspend?
>>> So only 'Hibernation' is enabled in kernel and HT disabled in BIOS ?
>>>
>>> If you mean that , sure I can try doing so. 
>> With suspend or hibernation enabled in the kernel, but with HT disabled in the
>> BIOS.
> 
> Ok trying in some minutes.

Disabling HT does not make any difference , nor disabling / enabling only one Hibernation or Suspend in kernel and BIOS
nor any combination of these.
 
> 
>>> I also could disable Suspend to RAM completly from BIOS as well if you want.
>> No, that rather won't work.
>>
>>>>> ...
>>>>>
>>>>> Oct 22 15:02:28 lara [   49.618795] Enabling non-boot CPUs ...
>>>>> Oct 22 15:02:28 lara [   49.622211] PM: Adding info for No Bus:msr1
>>>>> Oct 22 15:02:28 lara [   49.622259] PM: Adding info for No Bus:cpu1
>>>>> Oct 22 15:02:28 lara [   49.622302] SMP alternatives: switching to SMP code
>>>>> Oct 22 15:02:28 lara [   49.623536] Booting processor 1/1 eip 3000
>>>>> Oct 22 15:02:28 lara [   54.638093] Not responding.
>>>>> Oct 22 15:02:28 lara [   54.638096] Inquiring remote APIC #1...
>>>>> Oct 22 15:02:28 lara [   54.638099] ... APIC #1 ID: failed
>>>>> Oct 22 15:02:28 lara [   54.638204] ... APIC #1 VERSION: failed
>>>>> Oct 22 15:02:28 lara [   54.638307] ... APIC #1 SPIV: failed
>>>>> Oct 22 15:02:28 lara [   54.638427] skipping cpu1, didn't come online
>>>>> Oct 22 15:02:28 lara [   54.638602] PM: Removing info for No Bus:msr1
>>>>> Oct 22 15:02:28 lara [   54.638643] PM: Removing info for No Bus:cpu1
>>>>> Oct 22 15:02:28 lara [   54.638678] Error taking CPU1 up: -5
>>>>> Oct 22 15:02:28 lara [   54.640908] PM: Adding info for No Bus:msr2
>>>>> Oct 22 15:02:28 lara [   54.640939] PM: Adding info for No Bus:cpu2
>>>>> Oct 22 15:02:28 lara [   54.640976] SMP alternatives: switching to SMP code
>>>>> Oct 22 15:02:28 lara [   54.641961] Booting processor 2/2 eip 3000
>>>>> Oct 22 15:02:28 lara [   59.656795] Not responding.
>>>>> Oct 22 15:02:28 lara [   59.656799] Inquiring remote APIC #2...
>>>>> Oct 22 15:02:28 lara [   59.656803] ... APIC #2 ID: failed
>>>>> Oct 22 15:02:28 lara [   59.656907] ... APIC #2 VERSION: failed
>>>>> Oct 22 15:02:28 lara [   59.657011] ... APIC #2 SPIV: failed
>>>>> Oct 22 15:02:28 lara [   59.657131] skipping cpu2, didn't come online
>>>>> Oct 22 15:02:28 lara [   59.657300] PM: Removing info for No Bus:msr2
>>>>> Oct 22 15:02:28 lara [   59.657343] PM: Removing info for No Bus:cpu2
>>>>> Oct 22 15:02:28 lara [   59.657379] Error taking CPU2 up: -5
>>>>> Oct 22 15:02:28 lara [   59.659605] PM: Adding info for No Bus:msr3
>>>>> Oct 22 15:02:28 lara [   59.659637] PM: Adding info for No Bus:cpu3
>>>>> Oct 22 15:02:28 lara [   59.659673] SMP alternatives: switching to SMP code
>>>>> Oct 22 15:02:28 lara [   59.660725] Booting processor 3/3 eip 3000
>>>>> Oct 22 15:02:28 lara [   64.675517] Not responding.
>>>>> Oct 22 15:02:28 lara [   64.675520] Inquiring remote APIC #3...
>>>>> Oct 22 15:02:28 lara [   64.675524] ... APIC #3 ID: failed
>>>>> Oct 22 15:02:28 lara [   64.675628] ... APIC #3 VERSION: failed
>>>>> Oct 22 15:02:28 lara [   64.675731] ... APIC #3 SPIV: failed
>>>>> Oct 22 15:02:28 lara [   64.675859] skipping cpu3, didn't come online
>>>>> Oct 22 15:02:28 lara [   64.676017] PM: Removing info for No Bus:msr3
>>>>> Oct 22 15:02:28 lara [   64.676059] PM: Removing info for No Bus:cpu3
>>>>> Oct 22 15:02:28 lara [   64.676092] Error taking CPU3 up: -5
>>>>> Oct 22 15:02:28 lara [   64.676326] evxfevnt-0079 [00] enable                : System is already in ACPI mode
>>>>>
>>>>> ...
>>>>>
>>>>> After I've played with a lot boot options I found out booting with ' acpi=ht ' will make the CPU's work again but now
>>>>> I have a problem on Suspend. Everything seems to just go down disks etc but the box itself is for some reason still on.
>>>>> So I've tested reboot=<> options with no luck.
>>>>> ( after waiting 5 minutes to be sure everything is really off I can just hit power button). On resume now everything is fine.
>>>>>
>>>>> I'm not really sure what is wrong here acpi/hibernation/cpu-hotplug or a mix of all so I'm CC'ing linux-acpi as well.
>>>>> The only thing I noticed is the 'Breaking affinity for irq XX' on suspend without acpi=ht messages.
>>>>>
>>>>> I can't even tell whatever other kernel versions are working because aic7xxx driver didn't got suspend support till now 
>>>>> ( or at least never worked here ). I know suspend worked fine on windows with that box.
>>>>>
>>>>> There is my config and dmesg ( good and bad one ) :
>>>>>
>>>>>
>>>>> http://194.231.229.228/suspend/acpi=ht_working_dmesg.txt
>>>>> http://194.231.229.228/suspend/dmesg_broken_cpus_on_resume.txt
>>>>> http://194.231.229.228/suspend/config
>>>> Well, I think we have a problem with the CPU hotplug.
>>>>
>>>> Can you try to offline-online CPUs (without suspending) and see if that works?
>>> Yes does work when I do it manually :
>>>
>>> [ 6687.595842] CPU 1 is now offline
>>> [ 6687.711425] CPU 2 is now offline
>>> [ 6687.819330] CPU 3 is now offline
>>> [ 6687.819337] SMP alternatives: switching to UP code
>>> [ 6702.109605] SMP alternatives: switching to SMP code
>>> [ 6702.110634] Booting processor 1/1 eip 3000
>>> [ 6702.122140] Initializing CPU#1
>>> [ 6702.182045] Calibrating delay using timer specific routine.. 3989.26 BogoMIPS (lpj=1994633)
>>> [ 6702.182063] CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 00000000
>>> [ 6702.182085] CPU: Trace cache: 12K uops, L1 D cache: 8K
>>> [ 6702.182091] CPU: L2 cache: 512K
>>> [ 6702.182096] CPU: Physical Processor ID: 0
>>> [ 6702.182102] CPU: After all inits, caps: bfebfbff 00000000 00000000 0000b080 00004400 00000000 00000000 00000000
>>> [ 6702.182118] Intel machine check architecture supported.
>>> [ 6702.182130] Intel machine check reporting enabled on CPU#1.
>>> [ 6702.182137] CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
>>> [ 6702.182143] CPU1: Thermal monitoring enabled
>>> [ 6702.183563] CPU1: Intel(R) Xeon(TM) CPU 2.00GHz stepping 07
>>> [ 6702.184488] checking TSC synchronization [CPU#0 -> CPU#1]: passed.
>>> [ 6702.205500] Switched to high resolution mode on CPU 1
>>> [ 6702.210400] SMP alternatives: switching to SMP code
>>> [ 6702.212196] Booting processor 2/2 eip 3000
>>> [ 6702.222693] Initializing CPU#2
>>> [ 6702.282950] Calibrating delay using timer specific routine.. 3988.88 BogoMIPS (lpj=1994443)
>>> [ 6702.282962] CPU: After generic identify, caps: 3febfbff 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> [ 6702.282974] CPU: Trace cache: 12K uops, L1 D cache: 8K
>>> [ 6702.282977] CPU: L2 cache: 512K
>>> [ 6702.282980] CPU: Physical Processor ID: 3
>>> [ 6702.282983] CPU: After all inits, caps: 3febfbff 00000000 00000000 0000b080 00000000 00000000 00000000 00000000
>>> [ 6702.282991] Intel machine check architecture supported.
>>> [ 6702.282998] Intel machine check reporting enabled on CPU#2.
>>> [ 6702.283001] CPU2: Intel P4/Xeon Extended MCE MSRs (12) available
>>> [ 6702.283005] CPU2: Thermal monitoring enabled
>>> [ 6702.283300] CPU2: Intel(R) XEON(TM) CPU 2.00GHz stepping 04
>>> [ 6702.284296] checking TSC synchronization [CPU#1 -> CPU#2]: passed.
>>> [ 6702.305317] Switched to high resolution mode on CPU 2
>>> [ 6702.312356] SMP alternatives: switching to SMP code
>>> [ 6702.313995] Booting processor 3/3 eip 3000
>>> [ 6702.324511] Initializing CPU#3
>>> [ 6702.384864] Calibrating delay using timer specific routine.. 3988.87 BogoMIPS (lpj=1994438)
>>> [ 6702.384875] CPU: After generic identify, caps: 3febfbff 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> [ 6702.384888] CPU: Trace cache: 12K uops, L1 D cache: 8K
>>> [ 6702.384891] CPU: L2 cache: 512K
>>> [ 6702.384894] CPU: Physical Processor ID: 3
>>> [ 6702.384897] CPU: After all inits, caps: 3febfbff 00000000 00000000 0000b080 00000000 00000000 00000000 00000000
>>> [ 6702.384905] Intel machine check architecture supported.
>>> [ 6702.384912] Intel machine check reporting enabled on CPU#3.
>>> [ 6702.384915] CPU3: Intel P4/Xeon Extended MCE MSRs (12) available
>>> [ 6702.384919] CPU3: Thermal monitoring enabled
>>> [ 6702.385146] CPU3: Intel(R) XEON(TM) CPU 2.00GHz stepping 04
>>> [ 6702.386252] checking TSC synchronization [CPU#1 -> CPU#3]: passed.
>>> [ 6702.407259] Switched to high resolution mode on CPU 3
>>>
>>> ...
>>>
>>> done with :
>>> for i in cpu1 cpu2 cpu3; do echo 0 >/sys/devices/system/cpu/$i/online; done
>>>
>>> for i in cpu1 cpu2 cpu3; do echo 1 >/sys/devices/system/cpu/$i/online; done
>> Hm, well.
>>
>> Please apply the appended patch and then try:
> 
> Ok will do right after I test with disabled HT
> 
>> # echo 8 > /proc/sys/kernel/printk
>> # echo 5 > /sys/power/pm_test_level
>> # echo mem > /sys/power/state
>> (should wait for approx. 3 sec. and return to the boot prompt)
>> # echo 4 > /sys/power/pm_test_level
>> # echo mem > /sys/power/state
>> (should wait for approx. 3 sec. and return to the boot prompt)
>> ...
>> # echo 1 > /sys/power/pm_test_level
>> # echo mem > /sys/power/state
>> (should wait for approx. 3 sec. and return to the boot prompt)
>>
>> and see if you can reproduce the problem and for which test level.
>>
>> [Echoing 0 to /sys/power/pm_test_level restores the normal behavior.]

Also box just froze on level 3 but I got a ACPI error at least which I didn't got in any other dmesg till now :
( also patch was tested with HT disabled and Suspend and Hibernation enabled in kernel and BIOS )

...

Oct 23 01:51:05 lara [  273.512374] PM: Removing info for No Bus:input0
Oct 23 01:51:05 lara [  274.545158] PM: Removing info for No Bus:mouse0
Oct 23 01:51:05 lara [  274.551435] PM: Removing info for No Bus:event1
Oct 23 01:51:05 lara [  274.559493] PM: Removing info for No Bus:input1
Oct 23 01:53:06 lara [  394.869468] ACPI Error (evevent-0303): No installed handler for fixed event [00000002] [20070126]

....

( I hard reseted after that ) 

I try level 2 and 1 now I just wanted to let you know.

>>
>> Greetings,
>> Rafael
>>

Gabriel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ