netdev - Re: Tc bug (kernel crash) more info

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070830063110.GB1677@ff.dom.local>
Date:	Thu, 30 Aug 2007 08:31:10 +0200
From:	Jarek Poplawski <jarkao2@...pl>
To:	slavon@...telecom.ru
Cc:	netdev@...r.kernel.org
Subject: Re: Tc bug (kernel crash) more info

On Thu, Aug 30, 2007 at 12:16:32AM +0400, slavon@...telecom.ru wrote:
> Quoting Jarek Poplawski <jarkao2@...pl>:
> 
> >On Wed, Aug 29, 2007 at 04:53:52PM +0400, Badalian Vyacheslav wrote:
> >...
> >>we have this kernel panic (then delete HTB) at all 2.6.18-x versions.
> >>on older kernel (2.6.x) we have another panic (then delete tc filter)...
> >>summary we have TC panics 1 year ago ;) Sysctl option "reboot on panic"
> >
> >I'm not sure: do you mean it was less often? Did you try to report it
> >here? (Delete HTB: qdisc or classes?)
> >
> 
> i was can't catch bug. now i have configured netconsole to catch panics.
> for every clinet run command like:

If some error repeats you should report it even without logs. Sometimes
people here could help to catch this, but at least they know something
is wrong around and look at the code more carefully.

> 
> ### command to recreate HTB
> tc filter del dev eth1 protocol ip parent 1:0 prio 5 handle 4:9:a1 u32
...

I need more time to think about it.

> In my desktop system i have "Black dead" (2.6.22-r5) All freeze (on  
> monitor KDE desctop. mouse, keyboard, network and other not work. HDD  
> led is on. No panics.)
> 
> Say that info you need. I will try get it.

I still think, at least .config and dmesg could be interesting.

> 
> PS. And also have we have strange bug in another computer (2.6.22-r5).
> Have computer XEON_CPUx2 (4 CPU)
> 
> after boot have CPU0 and CPU3 SI = ~50%
> after some time CPU0 SI = 0% and ksoftirqd/2 process have 100% cpu usage!
> nat-new ~ # cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>   0:        403          0          0          0   IO-APIC-edge      timer
...
> LOC:   89312505   89314019   89310139   89313972
> ERR:          0
> MIS:          0
> 
> changes only LOC interrupts!
> 
> Maybe its info intresting for you. =)

Yes. It seems something loops or breaks with disabled interrupts. If
it's possible on this box try this 2.6.23-rc4 (and as minimum devices
and as maximum debug options in config as possible). Without anything
in logs or from the screen it could be hard, so maybe you need to
experiment with different configs and kernel versions.

Thanks,
Jarek P.

PS: if it's possible you can try this patch maybe with some fake load
plus these tc scripts (for testing only, linux 2.6.22.5).

---

diff -Nurp linux-2.6.22.5-/net/sched/sch_htb.c linux-2.6.22.5/net/sched/sch_htb.c
--- linux-2.6.22.5-/net/sched/sch_htb.c	2007-07-09 01:32:17.000000000 +0200
+++ linux-2.6.22.5/net/sched/sch_htb.c	2007-08-29 20:32:26.000000000 +0200
@@ -394,6 +394,14 @@ static void htb_safe_rb_erase(struct rb_
 {
 	if (RB_EMPTY_NODE(rb)) {
 		WARN_ON(1);
+	} else if (RB_EMPTY_ROOT(root)) {
+		WARN_ON(1);
+	} else if (((unsigned long)rb & ~3) == 0) {
+		WARN_ON(1);
+	} else if (((unsigned long)root & ~3) == 0) {
+		WARN_ON(1);
+	} else if (rb_parent(rb) == NULL) {
+		WARN_ON(1);
 	} else {
 		rb_erase(rb, root);
 		RB_CLEAR_NODE(rb);
@@ -688,7 +696,11 @@ static void htb_rate_timer(unsigned long
 
 
 	/* lock queue so that we can muck with it */
-	spin_lock_bh(&sch->dev->queue_lock);
+	if (!spin_trylock_bh(&sch->dev->queue_lock)) {
+		q->rttim.expires = jiffies + 1;
+		add_timer(&q->rttim);
+		return;
+	}
 
 	q->rttim.expires = jiffies + HZ;
 	add_timer(&q->rttim);
@@ -1306,7 +1318,8 @@ static void htb_destroy(struct Qdisc *sc
 
 	qdisc_watchdog_cancel(&q->watchdog);
 #ifdef HTB_RATECM
-	del_timer_sync(&q->rttim);
+	if (!del_timer_sync(&q->rttim))
+		del_timer(&q->rttim);
 #endif
 	/* This line used to be after htb_destroy_class call below
 	   and surprisingly it worked in 2.4. But it must precede it
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html