lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 26 Sep 2007 17:25:49 +0200
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Andi Kleen <ak@...e.de>, Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>
Subject: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

Thomas,

On Tuesday, 25 September 2007 23:24, Thomas Gleixner wrote:
> Rafael,
> 
> On Tue, 2007-09-25 at 23:28 +0200, Rafael J. Wysocki wrote:
> > > I'm a bit confused by your earlier confirmation, that mainline w/o the
> > > -hrt patches boots fine, when you add "apicmaintimer" to the kernel
> > > command line. "apicmaintimer" stops the PIT like we do in -hrt and we
> > > just use the local APIC timer for everything. Can you please retest and
> > > confirm that this is correct ?
> > 
> > No, it's not.  The mainline _usually_ doesn't boot with "apicmaintimer".
> > 
> > It seems to me that _sometimes_ the CPU just doesn't enter this C1E state
> > and then everything goes fine ...
> 
> I'm relieved. I really started to go nuts on this contradicting
> patterns.
> 
> Your box seems to be worse than the VAIO, it has some random surprise
> generator built in :)
> 
> > > Is the 32 bit kernel working on that box ?
> > 
> > Can't tell, I have only 64-bit userland here.
> 
> Should be fine. The check is there since late 2.6.21-rc. I really could
> kick my own ass that I did not remember the nx6325 wreckage in the
> 2.6.21-rc time frame. Sigh, way too much broken hardware out there to
> keep track of it.
> 
> > > Thanks for your patience.
> > 
> > Well, I'm only making sure that future kernels will run on my box. ;-)
> 
> Nothing wrong with that. Thanks again for your help,

There still are some oddities.

First, with the "x86-64: Disable local APIC timer use on AMD systems with C1E"
patch and my collection of suspend patches applied, the box doesn't boot
(the suspend patches don't even thouch the boot code, so they should be
irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted
for 2.6.23-rc8) is applied in addition.  Is this expected?

Next, on 2.6.23-rc8 with the patches from:

http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/

plus the "x86-64: Disable local APIC timer use on AMD systems with C1E" patch
and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation doesn't
work correctly.  Although the box hibernates and restores, there is a temporary
"hang" during the "resume hardware" sequence, after which the "lock" led starts
to blink (and remains in this state) and something like this appears in dmesg:

Extended CMOS year: 2000
Enabling non-boot CPUs ...
SMP alternatives: switching to SMP code
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 3990.36 BogoMIPS (lpj=7980735)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
Unable to handle kernel paging request at ffffffff806c64d4 RIP: 
 [<ffffffff802104cb>] identify_cpu+0x2ac/0x5a1
PGD 203067 PUD 207063 PMD 37fb4163 PTE 6c6000
Oops: 0002 [1] SMP 
CPU 1 
Modules linked in: ip6t_LOG nf_conntrack_ipv6 xt_pkttype ipt_LOG xt_limit cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table thermal processor fan snd_pcm_oss button snd_mixer_oss snd_seq battery snd_seq_device ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter ip6table_mangle nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 loop dm_mod rfcomm hidp l2cap usbhid ff_memless psmouse hci_usb bluetooth pcmcia tg3 ohci_hcd snd_hda_intel ehci_hcd yenta_socket rsrc_nonstatic ide_cd ohci1394 k8temp i2c_piix4 pcmcia_core sdhci shpchp snd_pcm usbcore hwmon i2c_core rtc_cmos rtc_core rtc_lib ieee1394 mmc_core tifm_7xx1 tifm_core pci_hotplug snd_timer cdrom snd firmware_class ieee80211softmac ieee80211 ieee80211_crypt soundcore snd_page_alloc ext3 jbd edd atiixp ide_disk ide_core sg
Pid: 0, comm: swapper Not tainted 2.6.23-rc8-rjw #6
RIP: 0010:[<ffffffff802104cb>]  [<ffffffff802104cb>] identify_cpu+0x2ac/0x5a1
RSP: 0018:ffff810037abdea8  EFLAGS: 00010006
RAX: 0000000014008015 RBX: 0000000001020800 RCX: 00000000c0010055
RDX: 0000000000000000 RSI: 0000000000040000 RDI: 0000000000000001
RBP: ffff810037abded8 R08: 0000000000000000 R09: ffffffff80444ad0
R10: ffffffff8070c860 R11: 0000000000000001 R12: ffffffff805920c0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff810037ac3e88(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffffff806c64d4 CR3: 0000000000201000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff810037abc000, task ffff810037a8f800)
Stack:  0000000f4e5a1540 000000000000059f 0000000000000001 ffffffff805920c0
 0000000000000001 0000000000000000 ffff810037abdef8 ffffffff8021acaa
 000000000000059f 0000000000000000 ffff810037abdf48 ffffffff8021b380
Call Trace:
 [<ffffffff8021acaa>] smp_callin+0xc8/0xde
 [<ffffffff8021b380>] start_secondary+0x1b/0x2e8


Code: c7 05 ff 5f 4b 00 01 00 00 00 e9 4f 01 00 00 4c 89 e7 e8 27 
RIP  [<ffffffff802104cb>] identify_cpu+0x2ac/0x5a1
 RSP <ffff810037abdea8>
CR2: ffffffff806c64d4
Kernel panic - not syncing: Attempted to kill the idle task!
Stuck ??
Inquiring remote APIC #1...
... APIC #1 ID: 01000000
... APIC #1 VERSION: 80050010
... APIC #1 SPIV: 000000ff
Error taking CPU1 up: -5
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access hardware directly.

Looks like the CPU hotplug is busted, or what?

Full dmesg attached.

Greetings,
Rafael

View attachment "2.6.23-rc8-rjw.log" of type "text/x-log" (51417 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ