lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110106124439.GA17004@verge.net.au>
Date:	Thu, 6 Jan 2011 21:44:39 +0900
From:	Simon Horman <horms@...ge.net.au>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Rusty Russell <rusty@...tcorp.com.au>,
	virtualization@...ts.linux-foundation.org,
	Jesse Gross <jesse@...ira.com>, dev@...nvswitch.org,
	virtualization@...ts.osdl.org, netdev@...r.kernel.org,
	kvm@...r.kernel.org, "Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: Flow Control and Port Mirroring Revisited

On Thu, Jan 06, 2011 at 11:22:42AM +0100, Eric Dumazet wrote:
> Le jeudi 06 janvier 2011 à 18:33 +0900, Simon Horman a écrit :
> > Hi,
> > 
> > Back in October I reported that I noticed a problem whereby flow control
> > breaks down when openvswitch is configured to mirror a port[1].
> > 
> > I have (finally) looked into this further and the problem appears to relate
> > to cloning of skbs, as Jesse Gross originally suspected.
> > 
> > More specifically, in do_execute_actions[2] the first n-1 times that an skb
> > needs to be transmitted it is cloned first and the final time the original
> > skb is used.
> > 
> > In the case that there is only one action, which is the normal case, then
> > the original skb will be used. But in the case of mirroring the cloning
> > comes into effect. And in my case the cloned skb seems to go to the (slow)
> > eth1 interface while the original skb goes to the (fast) dummy0 interface
> > that I set up to be a mirror. The result is that dummy0 "paces" the flow,
> > and its a cracking pace at that.
> > 
> > As an experiment I hacked do_execute_actions() to use the original skb
> > for the first action instead of the last one.  In my case the result was
> > that eth1 "paces" the flow, and things work reasonably nicely.
> > 
> > Well, sort of. Things work well for non-GSO skbs but extremely poorly for
> > GSO skbs where only 3 (yes 3, not 3%) end up at the remote host running
> > netserv. I'm unsure why, but I digress.
> > 
> > It seems to me that my hack illustrates the point that the flow ends up
> > being "paced" by one interface. However I think that what would be
> > desirable is that the flow is "paced" by the slowest link. Unfortunately
> > I'm unsure how to achieve that.
> > 
> 
> Hi Simon !
> 
> "pacing" is done because skb is attached to a socket, and a socket has a
> limited (but configurable) sndbuf. sk->sk_wmem_alloc is the current sum
> of all truesize skbs in flight.
> 
> When you enter something that :
> 
> 1) Get a clone of the skb, queue the clone to device X
> 2) queue the original skb to device Y
> 
> Then :	Socket sndbuf is not affected at all by device X queue.
> 	This is speed on device Y that matters.
> 
> You want to get servo control on both X and Y
> 
> You could try to
> 
> 1) Get a clone of skb
>    Attach it to socket too (so that socket get a feedback of final
> orphaning for the clone) with skb_set_owner_w()
>    queue the clone to device X
> 
> Unfortunatly, stacked skb->destructor() makes this possible only for
> known destructor (aka sock_wfree())

Hi Eric !

Thanks for the advice. I had thought about the socket buffer but at some
point it slipped my mind.

In any case the following patch seems to implement the change that I had in
mind. However my discussions Michael Tsirkin elsewhere in this thread are
beginning to make me think that think that perhaps this change isn't the
best solution.

diff --git a/datapath/actions.c b/datapath/actions.c
index 5e16143..505f13f 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -384,7 +384,12 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 
 	for (a = actions, rem = actions_len; rem > 0; a = nla_next(a, &rem)) {
 		if (prev_port != -1) {
-			do_output(dp, skb_clone(skb, GFP_ATOMIC), prev_port);
+			struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
+			if (nskb) {
+				if (skb->sk)
+					skb_set_owner_w(nskb, skb->sk);
+				do_output(dp, nskb, prev_port);
+			}
 			prev_port = -1;
 		}

I got a rather nasty panic without the if (skb->sk),
I guess some skbs don't have a socket.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ