netdev - Re: Socket buffer sizes with autotuning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <48118308.1090407@firstfloor.org>
Date:	Fri, 25 Apr 2008 09:06:48 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Jerry Chu <hkchu@...gle.com>, davem@...emloft.net,
	johnwheffner@...il.com, rick.jones2@...com, netdev@...r.kernel.org
Subject: Re: Socket buffer sizes with autotuning

[fixed cc and subject]

Jerry Chu wrote:
> On Thu, Apr 24, 2008 at 3:21 PM, Andi Kleen <andi@...stfloor.org> wrote:
>> David Miller <davem@...emloft.net> writes:
>>
>>>> What is your interface txqueuelen and mtu?  If you have a very large
>>>> interface queue, TCP will happily fill it up unless you are using a
>>>> delay-based congestion controller.
>>> Yes, that's the fundamental problem with loss based congestion
>>> control.  If there are any queues in the path, TCP will fill them up.
>> That just means Linux does too much queueing by default.  Perhaps that
>> should be fixed. On Ethernet hardware the NIC TX queue should be
>> usually sufficient anyways I would guess. Do we really need the long
>> qdisc queue too?
> 
> I think we really need the large xmit queue, especially when the CPU speed,
> or the aggregated CPU bandwidth in the case of multi-cores, is >> NIC speed
> for the following reason:
> 
> If the qdisc and/or NIC queue is not large enough, it may not absorb the high
> burst rate from the much faster CPU xmit threads, hence causing pkts to
> be dropped before they hit the wire. 

sendmsg should just be a little smarter on when to block depending on
the state of the interface. There is already some minor code for tnat
as you'll have noted. Then the bursts would be much less of a problem.

We already had this discussion recently together with better behaviour
on bounding.

The only big problem then would be if there are more submitting threads
than packets in the TX queue, but I would consider that unlikely for
GB+ NICs at least (might be an issue for older designs with smaller queues)

> Here the CPU/NIC relation is much like
> a router

It doesn't need to be. Unlike a true network it is very cheap here
to do direct feedback.

> Removing the unnecessary cwnd growth by counting out those pkts that are
> still stuck in the host queue may be a simpler solution. I'll find out
> how well it
> works soon.

I think that's a great start, but probably not enough.

-Andi



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html