[<prev] [next>] [day] [month] [year] [list]
Message-ID: <48A46618.4060306@ionic.de>
Date: Thu, 14 Aug 2008 19:06:32 +0200
From: Mihai Moldovan <ionic@...ic.de>
To: LKML <linux-kernel@...r.kernel.org>
Subject: [PROBLEM] Kernel crashes with 2.6.25-rc1 and above
Dear Kernel Hackers,
as indicated in the Subject line, I've got some sort of problem. All
Kernel above (and equal) 2.6.25-rc1 are crashing on my Notebook after a
*random* time, thus preventing me of using them.
When I first noticed that problem, I tried to get some usable result by
bisecting the Kernel, but after 2 weeks of bisecting only, I've given up.
My machine locks up after a random amount of uptime, and this is a real
problem. Before bisecting, I thought that this time would be at most 30
minutes (and in fact, newer Kernels seem to crash more rapid than older
ones), but while bisecting, I've come across the phenomena, that it
might take as well 2 or 4 hours for the box to crash. This in fact
means, that all my bisecting efforts are for the nuts, because I might
have marked versions as good, while they indeed were "bad" (I've marked
all Kernels "good" which still worked after 1 hour uptime, later I
changed to 2 hours, but I still...)
All in all, the problem is that I cannot really say whether a version is
good or bad, but after letting the box run for x hours... and x is
undefined. It might be a safe thing to let the box run 24 hours for
each Kernel and then mark the version as good or bad, but given that I
will have to test 13 or more Kernels this will make 2 weeks of testing
Kernels only, and I hope you can bear with me, this is really a lot of time.
Now, describing what happens is simple: the machine will totally lock
itself. No input or output is working anymore, the Kernel will not
respond to SysRq presses and also not respond to ping anymore. Due to
this fact, also no panic message is logged and honestly, I have not seen
any this whole time either.
I really am confused about this.
The only messages I could get were "Hangcheck: hangcheck value past
margin!", "rtc: lost y interrupts" (y is quite random as well) and this
one, when running hwclock:
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on+0x9b/0x10d()
Modules linked in: irtty_sir sir_dev ipw2200 yenta_socket rsrc_nonstatic
pcmcia_core tifm_7xx1 tifm_core sky2
Pid: 2704, comm: hwclock Not tainted
2.6.24-uvesafb-tuxonice-squashFS3.2-04814-gd2e626f #1
[<c01205ec>] warn_on_slowpath+0x41/0x51
[<c010b376>] ? save_stack_address+0x0/0x28
[<c013a2e1>] ? check_usage_forwards+0x19/0x3b
[<c013b726>] ? __lock_acquire+0xac2/0xb0a
[<c03942db>] ? ata_qc_complete+0x115/0x128
[<c0108c60>] ? native_sched_clock+0x8b/0x9f
[<c0138b89>] ? put_lock_stats+0xd/0x21
[<c05362ec>] ? _spin_unlock_irq+0x22/0x42
[<c013a83f>] trace_hardirqs_on+0x9b/0x10d
[<c05362ec>] _spin_unlock_irq+0x22/0x42
[<c0114829>] hpet_rtc_interrupt+0xdf/0x290
[<c01509d8>] handle_IRQ_event+0x1a/0x46
[<c0151832>] handle_edge_irq+0xbe/0xff
[<c0151774>] ? handle_edge_irq+0x0/0xff
[<c0106f09>] do_IRQ+0xab/0xd4
[<c010555a>] common_interrupt+0x2e/0x34
=======================
---[ end trace 3f0a8d3fa0ba549b ]---
I *suspect* that the RTC subsystem _might_ be related to my problem,
because all those warning messages came up with at some point of 2.6.24
first, but I cannot really state that they are the evil making my
machine crash.
At this point, I am out of ideas and hope that some experienced person
can help me.
Best regards,
Mihai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists