[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20101211224018.GA2547@verge.net.au>
Date: Sun, 12 Dec 2010 07:40:20 +0900
From: Simon Horman <horms@...ge.net.au>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: netdev@...r.kernel.org, Ben Hutchings <bhutchings@...arflare.com>
Subject: Re: [PATCH] rfc: ethtool: early-orphan control
On Sat, Dec 11, 2010 at 06:11:20PM +0100, Eric Dumazet wrote:
> Le samedi 11 décembre 2010 à 09:03 +0100, Eric Dumazet a écrit :
> > Le samedi 11 décembre 2010 à 13:24 +0900, Simon Horman a écrit :
> > > On Sat, Dec 11, 2010 at 01:13:35PM +0900, Simon Horman wrote:
> > > > Early orphaning is an optimisation which avoids unnecessary cache misses by
> > > > orphaning an skb just before it is handed to a device for transmit thus
> > > > avoiding the case where the orphaning occurs on a different CPU.
> > > >
> > > > In the case of bonded devices this has the unfortunate side-effect of
> > > > breaking down flow control allowing a socket to send UDP packets as fast as
> > > > the CPU will allow. This is particularly undesirable in virtualised
> > > > network environments.
> > > >
> > > > This patch introduces ethtool control of early orphaning.
> > > > It remains on by default by it now may be disabled on a per-interface basis.
> > > >
> > > > I have implemented this as a generic flag.
> > > > As it seems to be the first generic flag that requires
> > > > no driver awareness I also supplied a default flag handler.
> > > > I am unsure if any aspect of this approach is acceptable.
> > > >
> > > > I believe Eric has it in mind that some of the calls
> > > > to skb_orphan() in drivers can be removed with the addition
> > > > of this feature. I need to discuss that with him further.
> > > >
> > > > A patch for the ethtool user-space utility accompanies this patch.
> > >
> > > The following results were measured using kvm using virto without vhost net.
> > > The virtio device is bridged to a bond device which has one gigabit slave.
> > >
> >
> > As you know, vhost net does the orphaning, as well as some NIC drivers,
> > so one UDP flood would have same problem.
> >
> > I wonder if this problem could not be solved in other ways.
> >
> >
> > We might do early orphaning only for sockets with SOCK_USE_WRITE_QUEUE
> > flag asserted. (tcp sets it)
> >
> > Then, we could also say : Why tcp use sock_wfree() at all...
> >
>
> I removed skb_orphan_try() and did a quick test, with bonding or not,
> same results on a Gigabit interface.
>
> $ netperf -C -c -4 -t UDP_STREAM -H 55.225.18.57 -- -m 1000
> UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 55.225.18.57 (55.225.18.57) port 0 AF_INET
> Socket Message Elapsed Messages CPU Service
> Size Size Time Okay Errors Throughput Util Demand
> bytes bytes secs # # 10^6bits/sec % SS us/KB
>
> 10000000 1000 10.00 6611385 0 5289.0 13.18 9.278
> 1000000 10.00 1163454 930.7 4.58 6.456
>
>
> As soon as 'socket size' is big enough, UDP flow control is ineffective,
> and no error is reported to user. sendto() says all frames were properly sent.
>
Yes, I've done that test too (as you suggested previously). But my thought
was that in a virtualised environment the administrator of the host can set
the socket size to be small enough and the guest can't change it.
However, I now realise that the same effect can be produced
in the guest's network stack by increasing wmem_default there.
So I'm not sure that this change is useful after all. And I've
got a worse flow control problem than I previously realised.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists