lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 21 Aug 2008 13:18:23 +0200
From:	"Vegard Nossum" <vegard.nossum@...il.com>
To:	"Maciej W. Rozycki" <macro@...ux-mips.org>
Cc:	"Rafael J. Wysocki" <rjw@...k.pl>, "Frans Pop" <elendil@...net.nl>,
	linux-kernel@...r.kernel.org, "Andi Kleen" <andi@...stfloor.org>,
	"Ingo Molnar" <mingo@...e.hu>
Subject: Re: 2.6.27-rc3: 'APIC error on CPU1: 00(40)', but only on resume!

On Thu, Aug 21, 2008 at 11:27 AM, Maciej W. Rozycki
<macro@...ux-mips.org> wrote:
> On Wed, 20 Aug 2008, Rafael J. Wysocki wrote:
>
>> On my box I see many "APIC error on CPU1: 00(40)" messages that don't seem
>> to be related to anything obviously bad and I've alwas been seeing them.
>
>  Barring a hardware erratum, this is a bug in the kernel.  It should be
> moderately easy to track down with some debugging added to writes
> accessing LVT and redirection table entries.

Hi,

I've also seen this a lot, so I have now written (I think) such a
debug patch (it's very crude) and tested it on my laptop, which
exhibits this problem.

The patch and full dmesg (with debug output) can be found here:

http://userweb.kernel.org/~vegard/bugs/20080821-apic/

The output looks like this (with register annotations by me; CPU id is
the second column)

APIC error on CPU0: 00(40)
Last 16 APIC writes:
0: 1: [00000380] = 00001f79
1: 1: [000000b0] = 00000000
2: 1: [00000380] = 00001f7e
3: 1: [000000b0] = 00000000
4: 1: [00000380] = 00001fa5
5: 1: [000000b0] = 00000000
6: 1: [00000380] = 00001f8c
7: 1: [000000b0] = 00000000
8: 1: [000000b0] = 00000000
9: 1: [00000380] = 00001e4e
10: 1: [000000b0] = 00000000
11: 1: [00000380] = 00001fa5
12: 1: [000000b0] = 00000000
13: 1: [00000380] = 00001f87 # Initial Count Register (for Timer)
14: 0: [00000280] = 00000000 # Error Status Register
15: 0: [000000b0] = 00000000 # EOI Register

The order is from oldest (0) to newest (15) write. I don't see any
writes to ICR in there, which means that IPIs can be ruled out? It
seems that it is the write to Timer that causes it. In another place,
we have this:

13: 1: [00000320] = 000100ef # LVT Timer Register
14: 0: [00000280] = 00000000
15: 0: [000000b0] = 00000000

This would be APIC_LVT_MASKED | LOCAL_TIMER_VECTOR.

The APIC error is seen approximately every 3 minutes.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ