netdev - Re: [RFC] TCP illinois max rtt aging

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 07 Dec 2007 04:41:50 -0800 (PST)
From:	David Miller <davem@...emloft.net>
To:	ilpo.jarvinen@...sinki.fi
Cc:	lachlan.andrew@...il.com, netdev@...r.kernel.org,
	quetchen@...tech.edu
Subject: Re: [RFC] TCP illinois max rtt aging

From: "Ilpo_Järvinen" <ilpo.jarvinen@...sinki.fi>
Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET)

> I guess if you get a large cumulative ACK, the amount of processing is 
> still overwhelming (added DaveM if he has some idea how to combat it).
> 
> Even a simple scenario (this isn't anything fancy at all, will occur all 
> the time): Just one loss => rest skbs grow one by one into a single 
> very large SACK block (and we do that efficiently for sure) => then the 
> fast retransmit gets delivered and a cumulative ACK for whole orig_window 
> arrives => clean_rtx_queue has to do a lot of processing. In this case we 
> could optimize RB-tree cleanup away (by just blanking it all) but still 
> getting rid of all those skbs is going to take a larger moment than I'd 
> like to see.
> 
> That tree blanking could be extended to cover anything which ACK more than 
> half of the tree by just replacing the root (and dealing with potential 
> recolorization of the root).

Yes, it's the classic problem.  But it ought to be at least
partially masked when TSO is in use, because we'll only process
a handful of SKBs.  The more effectively TSO batches, the
less work clean_rtx_queue() will do.

When not doing TSO the behavior is super-stupid, we bump reference
counts on the same page multiple times while running over the SKBs
since consequetive SKBs cover data in different spans of the same
page.

The core issue is that we have a poorly behaving data container,
and therefore that's obviously what we need to change.

Conceptually what we probably need to do is seperate the data
maintainence from the SKB objects themselves.  There is a blob
that maintains the paged data state for everything in the
retransmit queue.  SKBs are built and get the page pointers
but don't actually grab references to the pages, the blob
does that and it keeps track of how many SKB references to each
page there are, non-atomically.

The hardest part is dealing with the page lifetime issues.
Unfortunately, when we trim the rtx queue, references to the clones
can still exist in the driver output path.  It's a difficult problem
to overcome in fact, so in the end my suggestion above might not
even be workable.

> No idea about what it could do, haven't yet looked web100, I was planning 
> at some point of time...

Web100 just provides statistics and other kinds of connection data
to userspace, all the actual algorithm etc. modifications have been
merged upstream and yanked out of the web100 patch.  I was looking
at it the other night and it's frankly totally uninteresting these
days :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html