linux-kernel - Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <49829.86808.qm@web82105.mail.mud.yahoo.com>
Date:	Mon, 11 Aug 2008 11:01:42 -0700 (PDT)
From:	David Witbrodt <dawitbro@...global.net>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, Yinghai Lu <yhlu.kernel@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>, netdev <netdev@...r.kernel.org>
Subject: Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem

Thanks for the reply.  Let me address these out of order (cutting
your post):

> David, it would be nice to check whether tip/master still locks up for 
> you:
> 
>     http://people.redhat.com/mingo/tip.git/README
> 
> just to make sure no pending fix resolves your issue. (the bug is 
> probably still present, but might be worth checking nevertheless.)

Just to recap (not sure if you saw the earlier posts in the thread, going
back to Monday a week ago):

- kernel 2.6.25 worked for me without error

- kernel 2.6.26 locked up at boot when it finally became availble in
  Debian

- I was asked to grab Linus' git tree and try that, last Tuesday --
  v2.6.27-rc1 locked up like the 2.6.26 kernels.

OK, now grabbing "tip":
===== BEGIN SHELL STUFF =========
$ git remote add tip git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git
$ git remote update
$ git checkout tip/master

$ git show
commit 5cbd27ebcd387cbc48c47712eb671f35d85a575f
Merge: 2097987... 2ae111c...
Author: Ingo Molnar <mingo@...e.hu>
Date:   Mon Aug 11 16:43:42 2008 +0200

    Merge branch 'x86/core'
===== END SHELL STUFF =========

I copied into .config the configuration that I had prepared for 2.6.26
(which was prepared from my most recent version for 2.6.25) and ran
'make oldconfig'.  The kernel built fine, so I installed it and rebooted.

The kernel locked up as expected at inet_init().  Rebooting, and adding
"hpet=disable", allowed it to boot just fine... as expected.

I understand the need to do this, and was hoping that it would just
magically start working again.  But it didn't ... which means I don't
have to bisect again to find out when, where, and why!  ;)

> You can probably verify this by adding something like this to 
> kernel/timer.c's do_timer() function:
> 
>    if (printk_ratelimit())
>     printk("timer irq hit, jiffies: %ld\n", jiffies);

So I made this change to do_timer(), and rebuilt the kernel.

Unfortunately, I cannot report whether this change made a difference
in the output.  This is a 2.5 GHz AMD X2 4850e processor, and in the
few moments before freeze (maybe < 2 secs) the lines just scroll too
fast for me to read them.  Once the kernel locks up, that's it... no
more printk()'s.

I can report that the do_timer() change worked correctly, though.  I
rebooted with "hpet=disabled", and just happened to hit an automatic
fsck on a 460 GB partition.  The printk()'s interfere with the fsck 
progress indicator: about every 4 secs, ten of those printk()'s fire 
off very quickly.  (Pretty annoying, actually.)

> Yinghai, do you have any ideas about this particular problem? One theory 
> would be that your e820 changes might have caused a shuffling of 
> resources that made the hpet's timer IRQ generation inoperable.

It's so weird that this commit works on so many machines, but fails on
the 2 machines I have (with the same motherboard).  Of course, many
more people/machines might be affected by this issue, but they simply
aren't using 2.6.26 yet... and will find out later, the hard way!

Thanks,
Dave W.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/