linux-kernel - Re: upstream regression (IO-APIC?)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1225734574.8168.16.camel@alok-dev1>
Date:	Mon, 03 Nov 2008 09:49:34 -0800
From:	Alok Kataria <akataria@...are.com>
To:	Bartlomiej Zolnierkiewicz <bzolnier@...il.com>
Cc:	Ingo Molnar <mingo@...e.hu>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Robert Hancock <hancockr@...w.ca>,
	Arjan van de Ven <arjan@...radead.org>,
	Pavel Machek <pavel@...e.cz>
Subject: Re: upstream regression (IO-APIC?)

On Sun, 2008-11-02 at 12:24 -0800, Bartlomiej Zolnierkiewicz wrote:
> On Sunday 02 November 2008, Bartlomiej Zolnierkiewicz wrote:
> > On Thursday 30 October 2008, Robert Hancock wrote:
> > > Bartlomiej Zolnierkiewicz wrote:
> > > > The current Linus tree as of commit e946217e4fdaa67681bbabfa8e6b18641921f750
> > > > is broken for me.  I get either the following panic (see log from qemu below)
> > > > or lost IRQs on ATA init...  Is this a known issue?
> > > > 
> > > > PS The tree that I used before and was supposedly good (sorry, I'm too tired
> > > > to verify it now) had commit 57f8f7b60db6f1ed2c6918ab9230c4623a9dbe37 at head.
> > 
> > Unfortunately 57f8f7b60db6f1ed2c6918ab9230c4623a9dbe37 (v2.6.28-rc1)
> > is also bad.  Bisecting it further was a real pain (i.e. I hit broken
> > build with x86 irqbalance changes, broken build with netfilter nat
> > changes and jbd journal problem).  In the end it turned out that 2.6.27
> > is bad too!  However with 2.6.27 the panic occurs only once per several
> > attempts and if there is no panic kernel boots normally (no lost IRQs).
> > 
> > [...]
> > 
> > I finally managed to narrow it down to change making x86 use tsc_khz
> > for loops_per_jiffy -- commit 3da757daf86e498872855f0b5e101f763ba79499
> > ("x86: use cpu_khz for loops_per_jiffy calculation").  This approach
> > seems too simplistic (as I see now Arjan & Pavel expressed concerns
> > about it back when the patch was posted initially [1][2]).  Also it
> > would probably be preferred to re-use existing preset_lpj variable
> > (just like KVM does it for similar purpose [3]) instead of adding a
> > lpj_tsc one and increasing complexity.
> 
> It turned out that I can boot a kernel with different config with
> HZ == 250 just fine and switching to HZ == 1000 makes it fail.
> 
> 
> Looking into it some more:
> 
> HZ == 250 kernel (good):
> 
> Calibrating delay loop (skipped), value calculated using timer frequency.. 2986.79 BogoMIPS (lpj=5973580)
> 
> HZ == 1000 kernel (bad):
> 
> Calibrating delay loop (skipped), using tsc calculated value.. 2990.35 BogoMIPS (lpj=1495176)
> 
> HZ == 1000 kernel with hackyfix (good):
> 
> Calibrating delay using timer specific routine.. 3016.68 BogoMIPS (lpj=6033376)
> 
> 
> Argggh... lpj is used for udelay() & friends so this bug is quite
> dangerous (since udelay() & friends are used for hardware delays)...
> 
> [ The commit works for HZ == 250 because it does tsc_khz * 1000 / HZ,
>   tsc_khz * 4 => lpj assumption holds true and there is no frequency
>   scaling at boot. ]
> 
> The quick fix would be to replace 1000 / HZ by the magic number "4"

That's not right, the magic number 4 thing would not be correct.
On one of my systems for eg, i get this in dmesg

Detected 2010.400 MHz processor.
...
Calibrating delay using timer specific routine.. 4022.47 BogoMIPS
(lpj=2011235)

This is with an earlier kernel, the HZ value is 1000. And the lpj value
that we get from the calculation of (tsc_khz * 1000)/HZ is correct in
this case.  And on all the systems that i have checked this assumption
holds true.

One of the things that i suspect is that you are not using delay_tsc in
this case, i.e. tsc is not used for delay which is causing that panic

can you please try the patch below on your system ? 

[test-patch]

Index: linux-2.6/arch/x86/kernel/tsc.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/tsc.c	2008-10-15 10:51:14.000000000 -0700
+++ linux-2.6/arch/x86/kernel/tsc.c	2008-11-03 09:43:01.000000000 -0800
@@ -847,10 +847,6 @@
 		cpu_khz = calibrate_cpu();
 #endif
 
-	lpj = ((u64)tsc_khz * 1000);
-	do_div(lpj, HZ);
-	lpj_fine = lpj;
-
 	printk("Detected %lu.%03lu MHz processor.\n",
 			(unsigned long)cpu_khz / 1000,
 			(unsigned long)cpu_khz % 1000);
@@ -871,6 +867,10 @@
 	tsc_disabled = 0;
 
 	use_tsc_delay();
+	lpj = ((u64)tsc_khz * 1000);
+	do_div(lpj, HZ);
+	lpj_fine = lpj;
+
 	/* Check and install the TSC clocksource */
 	dmi_check_system(bad_tsc_dmi_table);
 	check_system_tsc_reliable();


> but the major question is whether can we reliably depend on the tsc_khz
> for lpj?

If the patch above doesn't help, I think the answer to your question is
- not on some particular hardware, but we would know.
Btw, what h/w are you running this on ?

Thanks,
Alok
> 
> Thanks,
> Bart

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/