netdev - Re: tbench wrt. loopback TSO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0811052108250.9159@wrl-59.cs.helsinki.fi>
Date:	Wed, 5 Nov 2008 21:46:03 +0200 (EET)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Rick Jones <rick.jones2@...com>
cc:	Evgeniy Polyakov <zbr@...emap.net>,
	David Miller <davem@...emloft.net>,
	Netdev <netdev@...r.kernel.org>, efault@....de, mingo@...e.hu,
	a.p.zijlstra@...llo.nl, Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: tbench wrt. loopback TSO

On Wed, 5 Nov 2008, Rick Jones wrote:

> Ilpo Järvinen wrote:
> > On Wed, 5 Nov 2008, Evgeniy Polyakov wrote:
> > 
> > 
> > >On Wed, Nov 05, 2008 at 02:25:57PM +0200, Ilpo Järvinen
> > >(ilpo.jarvinen@...sinki.fi) wrote:
> > >
> > > >The problem is that we'd need to _resegment with the next skb_ since the
> > > >mss boundary and skb boundary would basically constantly be running
> > > >out-of-sync. That won't get done currently by anything.
> > >
> > >Btw, what's that wrong if there will be sub-mss frame per tso frame?
> > 
> > 
> > I personally don't consider that to be a big deal... I suppose some see
> > it as bad thing because of the slightly larger header vs data ratio...
> > Which is significant only if you can saturate the link (or have unbounded
> > bandwidth such as with lo), so slower links are more affected than high
> > speed ones...
> 
> Can't say that I tend to "like" subMSS segments out there in a bulk 
> transfer but some pseudorandom thoughts:
> 
> And the worst that would be would be one full MSS and a single byte, getting
> us an average of (MSS+1)/2 (roughly).  It only gets better from there
> (2MSS+1)/3, (3MSS+1)/4 etc etc.

...Note that likelyhood of such 1 byte pathological case are not that high 
if one is sending pages... For malicious purposes one could always use 
TCP_NODELAY anyway to force similar small segments so it's hardly worth 
considering here.

For the most sensible cases with full pages, resulting segs are (1460,1448 
mss):

1 2.80548 2.82873
2 5.61096 5.65746
3 8.41644 8.48619
4 11.2219 11.3149
5 14.0274 14.1436
6 16.8329 16.9724
7 19.6384 19.8011
8 22.4438 22.6298
9 25.2493 25.4586
10 28.0548 28.2873
11 30.8603 31.116
12 33.6658 33.9448
13 36.4712 36.7735
14 39.2767 39.6022
15 42.0822 42.4309
16 44.8877 45.2597

The worst case seems to be 5 pages with 1460 which yields to 40 bytes 
payload.

> Ignoring the TSO case for a moment, if there is congestion and receiver 
> window available and a user makes a > MSS send that isn't an integral 
> multiple of the MSS, we don't delay the last subMSS segment do we?

Without TSO, only Nagle could prevent sending that submss portion, so the 
answer depends on what the window in-flight consists of.

With TSO, I guess this falls under tcp_tso_should_defer first...

And then, as far as the mss-splitter (that was quoted in this thread by 
DaveM) we send just the full segment if there's enough room in the 
receiver window and let the tso/gso handle splitting of that submss 
segment if necessary... ...Except for that last segment which is important 
for keeping track of nagle and we currently don't handle nagle correctly 
if the segment in question would have len > mss (as noted earlier this 
could be changed though after auditting those len < mss checks). So it 
will basically be the same as without tso (depends on in-flight win).

We _won't do_ nagle for the intermediate submss segments (it is a design 
decision which predates times I've been tracking kernel development), for 
obvious reasons it could just be enabled for them currently because then 
nagle would basically stall the transfer eg. when opposite dir needs to 
change mss due to reporting sack blocks. In general those middle submss 
skbs only occur if mss changes as they're split currently based on mss 
already at write time.

-- 
 i.