netdev - [STRAW MAN PATCH] sch_teql doesn't load-balance ppp(oatm) slaves

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <1332450218.32446.79.camel@shinybook.infradead.org>
Date:	Thu, 22 Mar 2012 21:03:38 +0000
From:	David Woodhouse <dwmw2@...radead.org>
To:	netdev@...r.kernel.org
Subject: [STRAW MAN PATCH] sch_teql doesn't load-balance ppp(oatm) slaves

ppp_xmit_process() loops, calling skb_dequeue() until it can no longer
push a frame to the channel. In the case of PPPoATM, it's only ever
going to fail to push a frame to the channel when sk->sk_sndbuf is
exceeded on the atm_vcc. We have a *huge* hidden queue there. (Reducing
the send buffer size to 4KiB with a hack in pppoatm_assign_vcc() didn't
fix the teql problem either.)

teql_dequeue() will *always* give up a skb when it's called, if there is
one. If there's *not*, and the tx queue becomes empty, then the device
for which teql_dequeue() was called is 'promoted' to the front of the
line (master->slaves). That device will receive the next packet that
comes in, even if there are other devices which are *also* idle and
waiting for packets. Whenever a new packet comes in, the *last* device
to call teql_dequeue() gets it.

I have a system with two ADSL lines, using PPPoATM and teql. Because of
the behaviour of teql described above, it only seems to use *one* of the
uplinks at a time. One link will be idle for seconds at a time, before
the ATM socket send buffer fills or we get lucky with timing and it
flips to the other device.

My simple 'fix' for this is as follows: if *another* device is already
waiting with its tx queue empty, then teql_dequeue() should *not* return
a new packet to its caller. It may not be the best fix — it may not even
be correct, but it's working and I finally get the full upload bandwidth
of both lines, rather than using only one at a time. The ISP lets me do
a 10-second dump of the traffic on my bonded lines, and I now see it
being properly interleaved between the two lines, making optimal use of
the two uplinks.

Anyone got better ideas?

--- net/sched/sch_teql.c~	2012-03-22 15:21:41.000000000 +0000
+++ net/sched/sch_teql.c	2012-03-22 16:42:28.684436315 +0000
@@ -100,6 +100,10 @@ teql_dequeue(struct Qdisc *sch)
 	struct netdev_queue *dat_queue;
 	struct sk_buff *skb;

+	if (dat->m->slaves && dat->m->slaves != sch &&
+	    !qdisc_peek_head(dat->m->slaves)) {
+		return NULL;
+	}
 	skb = __skb_dequeue(&dat->q);
 	dat_queue = netdev_get_tx_queue(dat->m->dev, 0);
 	if (skb == NULL) {

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@...el.com                              Intel Corporation

Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5818 bytes)