netdev - Re: using software TSO on non-TSO capable netdevices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0807311318120.4551@wrl-59.cs.helsinki.fi>
Date:	Thu, 31 Jul 2008 13:27:14 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Lennert Buytenhek <buytenh@...tstofly.org>
cc:	David Miller <davem@...emloft.net>,
	Netdev <netdev@...r.kernel.org>, akarkare@...vell.com,
	nico@....org, Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: using software TSO on non-TSO capable netdevices

On Thu, 31 Jul 2008, Lennert Buytenhek wrote:

> On Thu, Jul 31, 2008 at 10:34:13AM +0300, Ilpo Järvinen wrote:
> 
> > > > The hacky patch below (on top of 2.6.27-rc1 + stubbing out the
> > > > sk_can_gso() check) reduces the 1 GiB 1000 Mb/s sendfile test from:
> > >  ...
> > > > I.e. dramatic CPU time improvements, and some overall speedup as well.
> > > > 
> > > > I wonder if something like this can be done in a less hacky fashion --
> > > > the hard part I guess is deciding when to keep coalescing (to reduce
> > > > CPU overhead) vs. when to push out what has been coalesced so far (in
> > > > order to keep the pipe filled), and I'm not sure I have good ideas
> > > > about how to make that decision.
> > > 
> > > Interesting, I'll take a closer look at this.
> > > 
> > > Actually your patch is less of a surprise, because one of the issues I
> > > had to surmount constantly when rewriting the TSO output path was the
> > > implicit conflict between TSO deferral (to accumulate segments) and
> > > the nagle logic.
> > 
> > I think your statement makes very little sense to me (though I had to 
> > lookup the meaning of surmount but that seems not so significant 
> > anyway)... They both work into the same direction, ie., to delay sending 
> > to prevent excessive processing of small bits, but the region of operation 
> > shouldn't overlap (nagle works with <mss, and tso deferring logic 
> > basically begins from where the nagle ends)?
> > 
> > It seems to me that this not about conflict between TSO deferring and 
> > nagle sub-mss logic at all (perhaps there wasn't as direct relation to 
> > this issue as I read...?) AFAICT, the change only makes (!nonagle && 
> > tp->packets_out && tcp_minshall_check(tp)) test in tcp_nagle_check more 
> > likely to occur (and result in false), ie., basically we end up using 
> > nagle test also to prevent sending of >= mss skbs, besides the usual 
> > functionality which is to prevent sending in case of < mss sized ones. 
> > ...Which seems just an extension to what we checked for in 
> > tcp_tso_should_defer().
> 
> I wanted a way to get larger GSO segments, and the idea was to rig
> the nagle check to consider sub-N*mss frames as small frames and not
> let more than one of them into the pipe at any given time.  I don't
> know whether the change I made accomplishes exactly that, but it did
> end up giving me larger GSO segments, which was the goal.
>
> It makes the GSO segment size distribution pretty chaotic, though:

Your test accomplishes that only if there's a small segment in the 
outstanding window, ie., snd_sml points to outs. win (or packets_out is 
zero but that's probably not relevant).

Why not experimenting with modifying tcp_tso_should_defer instead to make 
it fully independent of snd_sml (existance of a sub mss skb in-flight), 
just make sure you don't try to defer past what min(tp->snd_cwnd, 
tcp_wnd_end(tp)) can give you at most (in theory you could apply some 
optimism and go even above in a slow start but that's not going to be very 
robust approach :-)).


-- 
 i.