netdev - Re: TCP and reordering

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1354120887.21562.87.camel@shinybook.infradead.org>
Date:	Wed, 28 Nov 2012 16:41:27 +0000
From:	David Woodhouse <dwmw2@...radead.org>
To:	Benjamin LaHaise <bcrl@...ck.org>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Vijay Subramanian <subramanian.vijay@...il.com>,
	David Miller <davem@...emloft.net>, saku@...i.fi,
	rick.jones2@...com, netdev@...r.kernel.org
Subject: Re: TCP and reordering

On Wed, 2012-11-28 at 11:19 -0500, Benjamin LaHaise wrote:
> On Wed, Nov 28, 2012 at 03:47:15PM +0000, David Woodhouse wrote:
> > On Wed, 2012-11-28 at 04:52 -0800, Eric Dumazet wrote:
> > > BQL is nice for high speed adapters.
> > 
> > For adapters with hugely deep queues, surely? There's a massive
> > correlation between the two, of course ??? but PPP over L2TP or PPPoE
> > ought to be included in the classification, right?
> 
> Possibly, but there are many setups where PPPoE/L2TP do not connect to 
> the congested link directly.

Absolutely. But in the cases where they *do* connect to the congested
link, and the packets are backing up on the *same* host, there's no
excuse for not actually knowing that and behaving appropriately :)

And even if the congested link is somewhere upstream, you'd hope that
something exists (like ECN) to let you know about it.

In the LNS case that I'm most familiar with, the LNS *does* know about
the bandwidth of each customer's ADSL line, and limits the bandwidth of
each session appropriately. It's much better to decide at the LNS which
packets to drop, than to let the telco decide. Or worse, to have the
ADSL link drop one ATM cell out of *every* packet when it's
overloaded...

> This sort of chaining of destructors is going to be very expensive in 
> terms of CPU cycles.  If this does get implemented, please ensure there is 
> a way to turn it off.

You asked that before, and I think we agreed that it would be acceptable
to use the existing CONFIG_BQL option?

I'm looking at adding ppp-channel equivalents of
netdev_{reset,sent,completed}_queue, and having the PPP channels call
them as appropriate. For some it's trivial, but in the PPPoE/L2TP cases
because we want to install destructors without stomping on TSQ it'll be
substantial enough that it should be compiled out if CONFIG_BQL isn't
enabled.

> That said, if there is local congestion, the benefits of BQL would be 
> worthwhile to have.

If there is local congestion... *or* if you have proper bandwidth
management on the link to the clients; either by knowing the bandwidth
and voluntarily limiting to it, or by something like ECN.

> > But I wish there was a nicer way to chain destructors. And no, I don't
> > count what GSO does. We can't use the cb here anyway since we're passing
> > it down the stack.
> 
> I think all the tunneling protocols are going to have the same problem 
> here, so it deserves some thought about how to tackle the issue in a 
> generic way without incurring a large amount of overhead. 

Right. There are a few cases of skb->destructor being used at different
levels of the stack where I suspect this might already be an issue, in
fact. And things like TSQ will silently be losing track of packets
because of skb_orphan, even before they've left the box.

Hah, and I note that l2tp is *already* stomping on skb->destructor for
its own purposes. So I could potentially just use its existing callback
and pretend I hadn't seen that it screws up TSQ, and leave the issue of
chaining destructors to be Someone Else's Problem™.

Actually, I think it overwrites the destructor without calling
skb_orphan() first — which will *really* upset TSQ, won't it?

>  This exact 
> problem is one of the reasons multilink PPP often doesn't work well over 
> L2TP or PPPoE as compared to its behaviour over ttys.

Another fun issue with tunnelling protocols and BQL... packets tend to
*grow* as they get encapsulated. So you might end up calling
netdev_sent_queue() with a given size, then netdev_completed_queue()
with a bigger packet later...

-- 
dwmw2

Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (6171 bytes)