lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 24 Apr 2008 17:49:33 -0700 From: "Jerry Chu" <hkchu@...gle.com> To: "John Heffner" <johnwheffner@...il.com> Cc: netdev@...r.kernel.org, "rick.jones2" <rick.jones2@...com>, davem@...emloft.net Subject: Re: Socket buffer sizes with autotuning On Thu, Apr 24, 2008 at 9:32 AM, John Heffner <johnwheffner@...il.com> wrote: > > On Wed, Apr 23, 2008 at 4:29 PM, Jerry Chu <hkchu@...gle.com> wrote: > > > > I've been seeing the same problem here and am trying to fix it. > > My fix is to not count those pkts still in the host queue as "prior_in_flight" > > when feeding the latter to tcp_cong_avoid(). This should cause > > tcp_is_cwnd_limited() test to fail when the previous in_flight build-up > > is all due to the large host queue, and stop the cwnd to grow beyond > > what's really necessary. > > Sounds like a useful optimization. Do you have a patch? Am working on one, but still need to completely rootcause the problem first, and do a lot more testing. I, like Rick Jones, have for a while thought either the autotuning, or the Congestion Window Validation (rfc2861) code should dampen the cwnd growth so the bug must be there, until last week when I decided to get to the bottom of this problem. One question: I currently use skb_shinfo(skb)->dataref == 1 for skb's on the sk_write_queue list as the heuristic to determine if a packet has hit the wire. This seems a good solution for the normal cases without requiring changes to the driver to notify TCP in the xmit completion path. But I can imagine there may be cases where another below-IP consumer of skb, e.g., tcpdump, can nullify the above heuristic. If the below IP consumer causes the skb ref count to drop to 1 prematurally, well the inflated cwnd problem comes back but it's no worse than before. What if the below IP skb reader causes the skb ref count to remain > 1 while pkts have long hit the wire? This may cause the fix to prevent cwnd from growing when needed, hence hurting performance. Is there a better solution than checking against dataref to determine if a pkt has hit the wire? Also the code to determine when/how much to defer in the TSO path seems too aggressive. It's currently based on a percentage (sysctl_tcp_tso_win_divisor) of min(snd_wnd, snd_cwnd). Would it be too much if the value is large? E.g., when I disable sysctl_tcp_tso_win_divisor, the cwnd of my simple netperf run drops exactly 1/3 from 1037 (segments) to 695. It seems to me the TSO defer factor should be based on an absolute count, e.g., 64KB. Jerry > > -John > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists