netdev - Re: 2.6.24 BUG: soft lockup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20080327.173418.18777696.davem@davemloft.net>
Date:	Thu, 27 Mar 2008 17:34:18 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	Matheos.Worku@....COM
Cc:	jesse.brandeburg@...el.com, jarkao2@...il.com,
	netdev@...r.kernel.org, herbert@...dor.apana.org.au,
	hadi@...erus.ca
Subject: Re: 2.6.24 BUG: soft lockup - CPU#X

From: Matheos Worku <Matheos.Worku@....COM>
Date: Thu, 27 Mar 2008 17:19:42 -0700

> Actually I am running a version of the nxge driver which uses only one 
> TX ring, no LLTX enabled so the driver does single threaded TX.

Ok.

> On the other hand, uperf (or iperf, netperf ) is running multiple TX
> connections in parallel and the connections are bound on multiple
> processors, hence they are running in parallel.

Yes, this is what I was interested in.

I think I know what's wrong.

If one cpu gets into the "qdisc remove, give to device" loop, other
cpus will simply add to the qdisc and that first cpu will do the all
of the TX processing.

This helps performance, but in your case it is clear that if the
device is fast enough and there are enough other cpus generating TX
traffic, it is quite trivial to get a cpu wedged there and never exit.

The code in question is net/sched/sch_generic.c:__qdisc_run(), it just
loops there until the device TX fills up or there are no more packets
in the qdisc queue.

qdisc_run() (in include/linux/pkt_sched.h) sets
__LINK_STATE_QDISC_RUNNING to tell other cpus that there is a cpu
processing the queue inside of __qdisc_run().

net/core/dev.c:dev_queue_xmit() then goes:

	if (q->enqueue) {
		/* Grab device queue */
		spin_lock(&dev->queue_lock);
		q = dev->qdisc;
		if (q->enqueue) {
			/* reset queue_mapping to zero */
			skb_set_queue_mapping(skb, 0);
			rc = q->enqueue(skb, q);
			qdisc_run(dev);
			spin_unlock(&dev->queue_lock);

			rc = rc == NET_XMIT_BYPASS ? NET_XMIT_SUCCESS : rc;
			goto out;
		}
		spin_unlock(&dev->queue_lock);
	}

The first cpu will get into __qdisc_run(), but the other
ones will just q->enqueue() and exit since the first cpu
has indicated it is processing the qdisc.

I'm not sure how we should fix this at the moment, we want
to keep the behavior but on the other hand we need to
break out of this so we don't get stuck here for too long.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html