[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1321592540.2444.31.camel@edumazet-laptop>
Date: Fri, 18 Nov 2011 06:02:20 +0100
From: Eric Dumazet <eric.dumazet@...il.com>
To: Tom Herbert <therbert@...gle.com>
Cc: Andy Fleming <afleming@...il.com>, Dave Taht <dave.taht@...il.com>,
Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: root_lock vs. device's TX lock
Le jeudi 17 novembre 2011 à 16:35 -0800, Tom Herbert a écrit :
> > Actually, I'm interested in circumventing *both* locks. Our SoC has
> > some quite-versatile queueing infrastructure, such that (for many
> > queueing setups) we can do all of the queueing in hardware, using
> > per-cpu access portals. By hacking around the qdisc lock, and using a
> > tx queue per core, we were able to achieve a significant speedup.
If packet reordering is not a concern (or taken into account in the
hardware)... A task sending tcp flow can migrate from cpu1 to cpu2...
> This was actually one of the motivations for my question. If we have
> a one TX queue per core, and and use a trivial mq aware qdisc for
> instance, the locking becomes mostly overhead. I don't mind taking a
> lock once per TX, but right now were taking three! (root lock twice,
> and device lock once).
>
> Even without one queue per TX, I think the overhead savings may still
> be present. Eric, I realize that a point of dropping the root lock in
> sch_direct_xmit is to possibly allow queuing to to qdisc and device
> xmit in parallel, but if you're using a trivial qdisc then the time in
> qdisc may be << time in device xmit, so the overhead of locking could
> mitigate the gains in parallelism. At the very least, this benefit is
> hugely variable depending on the qdisc used.
Andy speaks more of a way to bypass qdisc (direct to device), while I
thought Tom wanted to optimize htb/cbq/..complex qdisc handling...
Note we have LLTX thing to avoid taking dev->lock for some devices.
We could have a qdisc->fast_enqueue() method for some (qdiscs/device)
combinations, able to short cut __dev_xmit_skb() and do their own stuff
(using rcu locking or other synch method, percpu bytes/counter stats...)
But if device is really that smart, why even use a qdisc in the first
place ?
If qdisc not needed : We take only one lock (dev lock) to transmit a
packet. If device is LLTX, no lock taken at all.
If qdisc is needed : We need to take qdisc lock to enqueue packet.
Then, _if_ we own the __QDISC___STATE_RUNNING flag, we enter the loop
to dequeue and xmit packets (__qdisc_run() / sch_direct_xmit() doing the
insane lock flips)
Its a tough choice : Do we want to avoid false sharing in the device
itself or qdisc...
Current handling was designed so that one cpu was feeding the device and
other cpus feeding the qdisc.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists