netdev - Re: root_lock vs. device's TX lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1321592540.2444.31.camel@edumazet-laptop>
Date:	Fri, 18 Nov 2011 06:02:20 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Tom Herbert <therbert@...gle.com>
Cc:	Andy Fleming <afleming@...il.com>, Dave Taht <dave.taht@...il.com>,
	Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: root_lock vs. device's TX lock

Le jeudi 17 novembre 2011 à 16:35 -0800, Tom Herbert a écrit :
> > Actually, I'm interested in circumventing *both* locks. Our SoC has
> > some quite-versatile queueing infrastructure, such that (for many
> > queueing setups) we can do all of the queueing in hardware, using
> > per-cpu access portals. By hacking around the qdisc lock, and using a
> > tx queue per core, we were able to achieve a significant speedup.

If packet reordering is not a concern (or taken into account in the
hardware)... A task sending tcp flow can migrate from cpu1 to cpu2...

> This was actually one of the motivations for my question.  If we have
> a one TX queue per core, and and use a trivial mq aware qdisc for
> instance, the locking becomes mostly overhead.  I don't mind taking a
> lock once per TX, but right now were taking three! (root lock twice,
> and device lock once).
> 

> Even without one queue per TX, I think the overhead savings may still
> be present.  Eric, I realize that a point of dropping the root lock in
> sch_direct_xmit is to possibly allow queuing to to qdisc and device
> xmit in parallel, but if you're using a trivial qdisc then the time in
> qdisc may be << time in device xmit, so the overhead of locking could
> mitigate the gains in parallelism.  At the very least, this benefit is
> hugely variable depending on the qdisc used.

Andy speaks more of a way to bypass qdisc (direct to device), while I
thought Tom wanted to optimize htb/cbq/..complex qdisc handling...

Note we have LLTX thing to avoid taking dev->lock for some devices.

We could have a qdisc->fast_enqueue() method for some (qdiscs/device)
combinations, able to short cut __dev_xmit_skb() and do their own stuff
(using rcu locking or other synch method, percpu bytes/counter stats...)

But if device is really that smart, why even use a qdisc in the first
place ?

If qdisc not needed : We take only one lock (dev lock) to transmit a
packet. If device is LLTX, no lock taken at all.

If qdisc is needed : We need to take qdisc lock to enqueue packet.
   Then, _if_ we own the __QDISC___STATE_RUNNING flag, we enter the loop
to dequeue and xmit packets (__qdisc_run() / sch_direct_xmit() doing the
insane lock flips)

Its a tough choice : Do we want to avoid false sharing in the device
itself or qdisc...

Current handling was designed so that one cpu was feeding the device and
other cpus feeding the qdisc.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html