[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1209461278.2873.34.camel@ymzhang>
Date: Tue, 29 Apr 2008 17:27:58 +0800
From: "Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>
To: "Brandeburg, Jesse" <jesse.brandeburg@...el.com>
Cc: netdev@...r.kernel.org, Rick Jones <rick.jones2@...com>
Subject: RE: netperf udp_rr testing hang
On Mon, 2008-04-28 at 11:43 +0800, Zhang, Yanmin wrote:
> On Sun, 2008-04-27 at 16:47 -0700, Brandeburg, Jesse wrote:
> Thanks for your kind response. I think it might be an issue of kernel. Pls. see below comments.
I located the root cause.
kernel is ok. It's an issue of netperf.
I instrumented kernel and turn on netperf debug to capture more data.
As a matter of fact, netserver on the Server1 machine binds ip 0.0.0.0 and the port to
receive UDP packets, but netperf on Client1 machine binds ip 192.168.1.164 by bind and
remote ip 192.168.1.153 by connect. When Server1 sends back a response, it just chooses
one ip of Server1 as the source ip to send out the packets, because server socket just binds
0.0.0.0. So kernel on Client1 just drops the packets.
The fix could be one of them:
1) Don't call connect in netperf for UDP testing; But it looks like the transactions just pass
from one interface, not distributed on the 2 interface;
2) Pass remote_ip to server by udp_rr_request;
1 is more simple.
-yanmin
>
> > are you turning on arp_filter in sysctl?
> No. I use the default configuration, i.e. arp_filter=0.
>
> >
> > IMO you can't use two IP addresses in the same subnet on the same switch anyway, even with arp filter.
> >
> > if you were to assign 192.168.0.X to one interface and 192.168.1.X to the other, *and* then use arp filter it will work okay.
> Why does TCP work well? Lab manager just configures 192.168.0.XXX on the dns server.
>
> I tried arp_filter=1 a moment ago and it doesn't work.
>
> I checked document e1000.txt and it says Multiple Interfaces on Same Ethernet Broadcast Network
> results in unbalanced receive traffic. But it doesn't say it will break the network.
>
> >From the tcpdump info, I think kernel on the client machine always drops the first UDP packet unexpectedly, after
> the server (lkp-tt02-nic2.tsp.org) sends the first UDP response back. If I disable 192.168.1.160 (eth0:lkp-tt02-x8664.tsp.org)
> on server, the testing could go ahead. That also means the issue isn't relevant to arp_filter. If I firstly disable
> 192.168.1.160 (eth0:lkp-tt02-x8664.tsp.org) and then reenable it, restart testing, the testing also could go ahead, but then
> the testing to IP 192.168.1.160 becomes hanging. So it looks like only one ip at the same time could be
> available for UDP.
>
> >
> > jesse
> >
> > -----Original Message-----
> > From: netdev-owner@...r.kernel.org [mailto:netdev-owner@...r.kernel.org] On Behalf Of Zhang, Yanmin
> > Sent: Friday, April 25, 2008 12:42 AM
> > To: netdev@...r.kernel.org
> > Cc: Rick Jones
> > Subject: netperf udp_rr testing hang
> >
> > I am testing network UDP by netperf V2.4.4.
> >
> > I have 2 machines. Every machine has 2 NIC, so 2 IP addresses per machine.
> > Client1: 192.168.1.164 (eth1:lkp-h01-nic2.tsp.org) and 192.168.1.169 (eth2:lkp-h01.tsp.org).
> > Server1: 192.168.1.160 (eth0:lkp-tt02-x8664.tsp.org) and 192.168.1.153 (eth1:lkp-tt02-nic2.tsp.org).
> >
> > They are connected to the same GIGA switch.
> >
> > On Server1, start netserver:
> > #./netserver&
> >
> > Then, on Client1: start netperf:
> > #./netperf -t UDP_RR -l 60 -H 192.168.1.153 -L 192.168.1.164 -i 3,3 -I 99,5 -- -r 1,1
> > It looks like netperf hangs and exits after 180(or 60?) seconds. The result shows
> > RatePerSec is 0.0.
> > If I use tcpdump to intercept all packets on eth1 on client1, the dump shows:
> > 14:49:14.820924 IP lkp-h01.tsp.org.41456 > lkp-tt02-nic2.tsp.org.12865: . ack 1 win 46 <nop,nop,timestamp 1691048 1803244>
> > 14:49:14.821043 IP lkp-h01.tsp.org.ssh > lkp-os.tsp.org.45485: P 176:368(192) ack 49 win 146 <nop,nop,timestamp 1691048 1575061177>
> > 14:49:14.821047 IP lkp-h01.tsp.org.41456 > lkp-tt02-nic2.tsp.org.12865: P 1:257(256) ack 1 win 46 <nop,nop,timestamp 1691048 1803244>
> > 14:49:14.821157 IP lkp-tt02-nic2.tsp.org.12865 > lkp-h01.tsp.org.41456: . ack 257 win 54 <nop,nop,timestamp 1803244 1691048>
> > 14:49:14.821307 IP lkp-tt02-nic2.tsp.org.12865 > lkp-h01.tsp.org.41456: P 1:257(256) ack 257 win 54 <nop,nop,timestamp 1803244 1691048>
> > 14:49:14.821312 IP lkp-h01.tsp.org.41456 > lkp-tt02-nic2.tsp.org.12865: . ack 257 win 54 <nop,nop,timestamp 1691048 1803244>
> > 14:49:14.821348 IP lkp-h01.tsp.org.54226 > lkp-tt02-nic2.tsp.org.42164: UDP, length 1
> > 14:49:14.821406 IP lkp-tt02-x8664.tsp.org.42164 > lkp-h01.tsp.org.54226: UDP, length 1
> > 14:49:14.821415 IP lkp-h01.tsp.org > lkp-tt02-x8664.tsp.org: ICMP lkp-h01.tsp.org udp port 54226 unreachable, length 37
> >
> >
> > If I use tcpdump to intercept all packets on eth1 on Server1, the dump shows:
> > 23:54:12.320760 IP lkp-h01.tsp.org.41456 > lkp-tt02-nic2.tsp.org.12865: S 2825016431:2825016431(0) win 5840 <mss 1460,sackOK,timestamp 1691048 0,nop,wscale 7>
> > 23:54:12.320858 IP lkp-h01.tsp.org.41456 > lkp-tt02-nic2.tsp.org.12865: . ack 1965002601 win 46 <nop,nop,timestamp 1691048 1803244>
> > 23:54:12.321010 IP lkp-h01.tsp.org.41456 > lkp-tt02-nic2.tsp.org.12865: P 0:256(256) ack 1 win 46 <nop,nop,timestamp 1691048 1803244>
> > 23:54:12.321259 IP lkp-h01.tsp.org.41456 > lkp-tt02-nic2.tsp.org.12865: . ack 257 win 54 <nop,nop,timestamp 1691048 1803244>
> > 23:54:12.321271 IP lkp-h01.tsp.org.54226 > lkp-tt02-nic2.tsp.org.42164: UDP, length 1
> >
> >
> > If I start netperf by below command:
> > #./netperf -t UDP_RR -l 60 -H 192.168.1.160 -L 192.168.1.164 -i 3,3 -I 99,5 -- -r 1,1
> > The testing really goes ahead and prints correct result after testing. However, tcpdump shows
> > the packets just pass between lkp-h01.tsp.org.50303 and lkp-tt02-x8664.tsp.org.41305, not
> > lkp-h01-nic2.tsp.org.50303 and lkp-tt02-x8664.tsp.org.41305
> >
> > I check source codes of netperf and send_udp_rr really binds the correct local/host IP.
> >
> > I tries TCP_RR and it has no hang issue although packets might be sent out from another IP.
> >
> > I tested it with kernel 2.6.22/23/24/25.
> >
> > Thanks,
> > Yanmin
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists