[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070830063110.GB1677@ff.dom.local>
Date: Thu, 30 Aug 2007 08:31:10 +0200
From: Jarek Poplawski <jarkao2@...pl>
To: slavon@...telecom.ru
Cc: netdev@...r.kernel.org
Subject: Re: Tc bug (kernel crash) more info
On Thu, Aug 30, 2007 at 12:16:32AM +0400, slavon@...telecom.ru wrote:
> Quoting Jarek Poplawski <jarkao2@...pl>:
>
> >On Wed, Aug 29, 2007 at 04:53:52PM +0400, Badalian Vyacheslav wrote:
> >...
> >>we have this kernel panic (then delete HTB) at all 2.6.18-x versions.
> >>on older kernel (2.6.x) we have another panic (then delete tc filter)...
> >>summary we have TC panics 1 year ago ;) Sysctl option "reboot on panic"
> >
> >I'm not sure: do you mean it was less often? Did you try to report it
> >here? (Delete HTB: qdisc or classes?)
> >
>
> i was can't catch bug. now i have configured netconsole to catch panics.
> for every clinet run command like:
If some error repeats you should report it even without logs. Sometimes
people here could help to catch this, but at least they know something
is wrong around and look at the code more carefully.
>
> ### command to recreate HTB
> tc filter del dev eth1 protocol ip parent 1:0 prio 5 handle 4:9:a1 u32
...
I need more time to think about it.
> In my desktop system i have "Black dead" (2.6.22-r5) All freeze (on
> monitor KDE desctop. mouse, keyboard, network and other not work. HDD
> led is on. No panics.)
>
> Say that info you need. I will try get it.
I still think, at least .config and dmesg could be interesting.
>
> PS. And also have we have strange bug in another computer (2.6.22-r5).
> Have computer XEON_CPUx2 (4 CPU)
>
> after boot have CPU0 and CPU3 SI = ~50%
> after some time CPU0 SI = 0% and ksoftirqd/2 process have 100% cpu usage!
> nat-new ~ # cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3
> 0: 403 0 0 0 IO-APIC-edge timer
...
> LOC: 89312505 89314019 89310139 89313972
> ERR: 0
> MIS: 0
>
> changes only LOC interrupts!
>
> Maybe its info intresting for you. =)
Yes. It seems something loops or breaks with disabled interrupts. If
it's possible on this box try this 2.6.23-rc4 (and as minimum devices
and as maximum debug options in config as possible). Without anything
in logs or from the screen it could be hard, so maybe you need to
experiment with different configs and kernel versions.
Thanks,
Jarek P.
PS: if it's possible you can try this patch maybe with some fake load
plus these tc scripts (for testing only, linux 2.6.22.5).
---
diff -Nurp linux-2.6.22.5-/net/sched/sch_htb.c linux-2.6.22.5/net/sched/sch_htb.c
--- linux-2.6.22.5-/net/sched/sch_htb.c 2007-07-09 01:32:17.000000000 +0200
+++ linux-2.6.22.5/net/sched/sch_htb.c 2007-08-29 20:32:26.000000000 +0200
@@ -394,6 +394,14 @@ static void htb_safe_rb_erase(struct rb_
{
if (RB_EMPTY_NODE(rb)) {
WARN_ON(1);
+ } else if (RB_EMPTY_ROOT(root)) {
+ WARN_ON(1);
+ } else if (((unsigned long)rb & ~3) == 0) {
+ WARN_ON(1);
+ } else if (((unsigned long)root & ~3) == 0) {
+ WARN_ON(1);
+ } else if (rb_parent(rb) == NULL) {
+ WARN_ON(1);
} else {
rb_erase(rb, root);
RB_CLEAR_NODE(rb);
@@ -688,7 +696,11 @@ static void htb_rate_timer(unsigned long
/* lock queue so that we can muck with it */
- spin_lock_bh(&sch->dev->queue_lock);
+ if (!spin_trylock_bh(&sch->dev->queue_lock)) {
+ q->rttim.expires = jiffies + 1;
+ add_timer(&q->rttim);
+ return;
+ }
q->rttim.expires = jiffies + HZ;
add_timer(&q->rttim);
@@ -1306,7 +1318,8 @@ static void htb_destroy(struct Qdisc *sc
qdisc_watchdog_cancel(&q->watchdog);
#ifdef HTB_RATECM
- del_timer_sync(&q->rttim);
+ if (!del_timer_sync(&q->rttim))
+ del_timer(&q->rttim);
#endif
/* This line used to be after htb_destroy_class call below
and surprisingly it worked in 2.4. But it must precede it
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists