lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061103000631.GA23182@rhlx01.hs-esslingen.de>
Date:	Fri, 3 Nov 2006 01:06:31 +0100
From:	Andreas Mohr <andi@...x01.fht-esslingen.de>
To:	Andreas Mohr <andi@...x01.fht-esslingen.de>
Cc:	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...e.hu>,
	len.brown@...el.com, linux-kernel@...r.kernel.org
Subject: Re: CONFIG_NO_HZ: missed ticks, stall (keyb IRQ required) [2.6.18-rc4-mm1]

Hi,

[CC'd Len since he might want/be able to help despite this being VIA issue ;)]

On Thu, Nov 02, 2006 at 09:34:30PM +0100, Andreas Mohr wrote:
> This time with apic=debug (attached), and now we have messages such as:
> 
> lapic timer verify: delta 10754906 pmtimer 11935676 (2557644) lapic 1180770(0 1180770 1180807) on cpu 0
> 
> which means that the timer *is* unstable.
> 
> I'm starting to wonder what I could debug here...

OK, I debugged it some more, and I noticed some very strange ACPI status
again...

$ cat /proc/acpi/processor/CPU0/*
processor id:            0
acpi id:                 0
bus mastering control:   no
power management:        no
throttling control:      no
limit interface:         no
<not supported>
active state:            C0
max_cstate:              C8
bus master activity:     00000000
maximum allowed latency: 2000 usec
states:
<not supported>

HOWEVER (dmesg.log.gz):

ACPI: CPU0 (power states: C1[C1] C2[C2])

and

ACPI: lapic on CPU 0 stops in C2[C2]

So according to /proc/acpi/ I don't even *have* C1/C2, however
it's still being used.

Oh, wait, I just realized that of course I'm in 2.6.19-rc1-mm1 currently,
however booting into 2.6.19-rc4-mm1 *does* list C1/C2 states in /proc/acpi/,
in contrast to -rc1-mm1.
That would explain why 2.6.19-rc1-mm1 has no issues whatsoever with dynticks
- since it never even enters C1/C2!

OK, so dynticks in Linux > 2.6.19-rc1-mm1 broke because ACPI C1/C2
suddenly became available which killed my VIA APIC timer in C2.

How probable is it that the APIC timer got killed due to mis-programming
in Linux versus VIA chipset design garbage probability? I.e. do you think
there's a chance to fix C2 malfunction by going into the innards of
VIA chipsets operation?
How useful would it be to simply disable C2 operation (but not C1)
in CONFIG_NO_HZ mode after's been determined to kill APIC timer?:

lapic timer verify: delta 3435285 pmtimer 3469523 (743469) lapic 34238(0 34238 3
4338) on cpu 0
lapic timer verify: delta 6022 pmtimer 46853 (10040) lapic 40831(0 40831 40914)
on cpu 0
lapic timer verify: delta 66814 pmtimer 136000 (29143) lapic 69186(0 69186 69284
) on cpu 0
lapic timer verify: delta 19658 pmtimer 22092 (4734) lapic 2434(0 2434 2469) on
cpu 0
lapic timer verify: delta 9967 pmtimer 22624 (4848) lapic 12657(0 12657 12693) o
n cpu 0
lapic timer verify: delta 9681 pmtimer 21429 (4592) lapic 11748(0 11748 11945) o
n cpu 0
lapic timer verify: delta 59879 pmtimer 94822 (20319) lapic 34943(0 34943 35029)
 on cpu 0
lapic timer verify: delta 34878 pmtimer 52668 (11286) lapic 17790(0 17790 17876)
 on cpu 0
lapic timer verify: delta 32436 pmtimer 78992 (16927) lapic 46556(0 46556 46641)
 on cpu 0
lapic timer verify: delta 10450 pmtimer 75002 (16072) lapic 64552(0 64552 64590)
 on cpu 0
ACPI: lapic on CPU 0 stops in C2[C2]

Hmm, processor_idle.c in current -dynticks4 seems to contain code to do just
that: disable C states after they've been found harmful to timer operation?
But somehow it doesn't seem to work for me here, obviously.
If I don't get any further input on that I'll try to debug it myself soon.

http://www.linuxsymposium.org/proceedings/reprints/Reprint-Brown-OLS2004.pdf
is quite informative about APIC timer issues etc., BTW.

Andreas Mohr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ