[NET_SCHED] explict hold dev tx lock For N cpus, with full throttle traffic on all N CPUs, funneling traffic to the same ethernet device, the devices queue lock is contended by all N CPUs constantly. The TX lock is only contended by a max of 2 CPUS. In the current mode of qdisc operation, when all N CPUs contend for the dequeue region and one of them (after all the work) entering dequeue region, we may endup aborting the path if we are unable to get the tx lock and go back to contend for the queue lock. As N goes up, this gets worse. The changes in this patch result in a small increase in performance with a 4CPU (2xdual-core) with no irq binding. My tests are UDP based and keep all 4CPUs busy all the time for the period of the test. Both e1000 and tg3 showed similar behavior. I expect higher gains with more CPUs. Summary below with different UDP packets and the resulting pps seen. Note at around 200Bytes, the two dont seem that much different and we are approaching wire speed (with plenty of CPU available; eg at 512B, the app is sitting at 80% idle on both cases). +------------+--------------+-------------+------------+--------+ pktsize | 64B | 128B | 256B | 512B |1024B | +------------+--------------+-------------+------------+--------+ Original| 467482 | 463061 | 388267 | 216308 | 114704 | | | | | | | txlock | 468922 | 464060 | 388298 | 216316 | 114709 | ----------------------------------------------------------------- Signed-off-by: Jamal Hadi Salim --- commit b0e36991c5850dfe930f80ee508b08fdcabc18d1 tree b1787bba26f80a325298f89d1ec882cc5ab524ae parent 42765047105fdd496976bc1784d22eec1cd9b9aa author Jamal Hadi Salim Sun, 23 Sep 2007 09:09:17 -0400 committer Jamal Hadi Salim Sun, 23 Sep 2007 09:09:17 -0400 net/sched/sch_generic.c | 19 ++----------------- 1 files changed, 2 insertions(+), 17 deletions(-) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index e970e8e..95ae119 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -134,34 +134,19 @@ static inline int qdisc_restart(struct net_device *dev) { struct Qdisc *q = dev->qdisc; struct sk_buff *skb; - unsigned lockless; int ret; /* Dequeue packet */ if (unlikely((skb = dev_dequeue_skb(dev, q)) == NULL)) return 0; - /* - * When the driver has LLTX set, it does its own locking in - * start_xmit. These checks are worth it because even uncongested - * locks can be quite expensive. The driver can do a trylock, as - * is being done here; in case of lock contention it should return - * NETDEV_TX_LOCKED and the packet will be requeued. - */ - lockless = (dev->features & NETIF_F_LLTX); - - if (!lockless && !netif_tx_trylock(dev)) { - /* Another CPU grabbed the driver tx lock */ - return handle_dev_cpu_collision(skb, dev, q); - } /* And release queue */ spin_unlock(&dev->queue_lock); + HARD_TX_LOCK(dev, smp_processor_id()); ret = dev_hard_start_xmit(skb, dev); - - if (!lockless) - netif_tx_unlock(dev); + HARD_TX_UNLOCK(dev); spin_lock(&dev->queue_lock); q = dev->qdisc;