lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 31 Jul 2008 11:50:50 +0200
From:	Lennert Buytenhek <buytenh@...tstofly.org>
To:	Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
Cc:	David Miller <davem@...emloft.net>,
	Netdev <netdev@...r.kernel.org>, akarkare@...vell.com,
	nico@....org, Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: using software TSO on non-TSO capable netdevices

On Thu, Jul 31, 2008 at 10:34:13AM +0300, Ilpo Järvinen wrote:

> > > The hacky patch below (on top of 2.6.27-rc1 + stubbing out the
> > > sk_can_gso() check) reduces the 1 GiB 1000 Mb/s sendfile test from:
> >  ...
> > > I.e. dramatic CPU time improvements, and some overall speedup as well.
> > > 
> > > I wonder if something like this can be done in a less hacky fashion --
> > > the hard part I guess is deciding when to keep coalescing (to reduce
> > > CPU overhead) vs. when to push out what has been coalesced so far (in
> > > order to keep the pipe filled), and I'm not sure I have good ideas
> > > about how to make that decision.
> > 
> > Interesting, I'll take a closer look at this.
> > 
> > Actually your patch is less of a surprise, because one of the issues I
> > had to surmount constantly when rewriting the TSO output path was the
> > implicit conflict between TSO deferral (to accumulate segments) and
> > the nagle logic.
> 
> I think your statement makes very little sense to me (though I had to 
> lookup the meaning of surmount but that seems not so significant 
> anyway)... They both work into the same direction, ie., to delay sending 
> to prevent excessive processing of small bits, but the region of operation 
> shouldn't overlap (nagle works with <mss, and tso deferring logic 
> basically begins from where the nagle ends)?
> 
> It seems to me that this not about conflict between TSO deferring and 
> nagle sub-mss logic at all (perhaps there wasn't as direct relation to 
> this issue as I read...?) AFAICT, the change only makes (!nonagle && 
> tp->packets_out && tcp_minshall_check(tp)) test in tcp_nagle_check more 
> likely to occur (and result in false), ie., basically we end up using 
> nagle test also to prevent sending of >= mss skbs, besides the usual 
> functionality which is to prevent sending in case of < mss sized ones. 
> ...Which seems just an extension to what we checked for in 
> tcp_tso_should_defer().

I wanted a way to get larger GSO segments, and the idea was to rig
the nagle check to consider sub-N*mss frames as small frames and not
let more than one of them into the pipe at any given time.  I don't
know whether the change I made accomplishes exactly that, but it did
end up giving me larger GSO segments, which was the goal.

It makes the GSO segment size distribution pretty chaotic, though:

10k seg: 2:851 3:430 4:3385 5:330 6:3611 7:382 8:949 9:18 10:43 11:1
10k size: 5:851 8:430 11:3385 14:330 17:3611 19:382 22:949 25:18 28:43 31:1
10k seg: 2:1952 3:410 4:2855 5:340 6:2956 7:356 8:1059 9:24 10:48
10k size: 5:1952 8:410 11:2855 14:340 17:2956 19:356 22:1059 25:24 28:48
10k seg: 2:1036 3:569 4:4824 5:369 6:2241 7:251 8:643 9:20 10:46 11:1
10k size: 5:1036 8:569 11:4824 14:369 17:2241 19:251 22:643 25:20 28:46 31:1
10k seg: 2:1270 3:408 4:3686 5:350 6:2910 7:319 8:988 9:15 10:54
10k size: 5:1270 8:408 11:3686 14:350 17:2910 19:319 22:988 25:15 28:54
10k seg: 2:870 3:407 4:4211 5:380 6:3392 7:286 8:389 9:20 10:45
10k size: 5:870 8:407 11:4211 14:380 17:3392 19:286 22:389 25:20 28:45
10k seg: 2:1217 3:411 4:3542 5:315 6:3263 7:348 8:832 9:23 10:48 11:1
10k size: 5:1217 8:411 11:3542 14:315 17:3263 19:348 22:832 25:23 28:48 31:1

("10k seg" numbers are the distribution of gso_segs for 10k skbuffs,
and "10k size" are the distribution of skb->len >> 9 for 10k skbuffs.)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ