netdev - Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090709104412.GA3651@ami.dom.local>
Date:	Thu, 9 Jul 2009 12:44:12 +0200
From:	Jarek Poplawski <jarkao2@...il.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Andres Freund <andres@...razel.de>,
	Joao Correia <joaomiguelcorreia@...il.com>,
	Arun R Bharadwaj <arun@...ux.vnet.ibm.com>,
	Stephen Hemminger <shemminger@...tta.com>,
	netdev@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
	Patrick McHardy <kaber@...sh.net>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 (
	possibly?caused by netem)

On Thu, Jul 09, 2009 at 12:31:53PM +0200, Thomas Gleixner wrote:
> On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> > On Thu, Jul 09, 2009 at 12:23:17AM +0200, Andres Freund wrote:
> > ...
> > > Unfortunately this just yields the same backtraces during softlockup and not 
> > > earlier.
> > > I did not test without lockdep yet, but that should not have stopped the BUG 
> > > from appearing, right?
> > 
> > Since it looks like hrtimers now, these changes in timers shouldn't
> > matter. Let's wait for new ideas.
> 
> Some background:
...
> There is another oddity in cbq_undelay() which is the hrtimer callback
> function:
> 
> 	if (delay) {
> 		ktime_t time;
> 
> 		time = ktime_set(0, 0);
> 		time = ktime_add_ns(time, PSCHED_TICKS2NS(now + delay));
> 		hrtimer_start(&q->delay_timer, time, HRTIMER_MODE_ABS);
> 
> The canocial way to restart a hrtimer from the callback function is to
> set the expiry value and return HRTIMER_RESTART.

OK, that's for later because we didn't use cbq here.

> 
> 	}
> 
> 	sch->flags &= ~TCQ_F_THROTTLED;
> 	__netif_schedule(qdisc_root(sch));
> 	return HRTIMER_NORESTART;
> 
> Again, this should not cause the timer to be enqueued on another CPU
> as we do not enqueue on a different CPU when the callback is running,
> but see above ...
> 
> I have the feeling that the code relies on some implicit cpu
> boundness, which is not longer guaranteed with the timer migration
> changes, but that's a question for the network experts.

As a matter of fact, I've just looked at this __netif_schedule(),
which really is cpu bound, so you might be 100% right.

Thanks for your help,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html