[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120315204719.487b6ffe@vostro>
Date: Thu, 15 Mar 2012 20:47:19 +0200
From: Timo Teras <timo.teras@....fi>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Francois Romieu <romieu@...zoreil.com>,
Ben Hutchings <bhutchings@...arflare.com>,
netdev@...r.kernel.org
Subject: Re: linux-3.0.18+r8169+ipv4/tcp forwarding = tso/gso weirdness and
performance degration
On Thu, 15 Mar 2012 09:11:46 -0700 Eric Dumazet
<eric.dumazet@...il.com> wrote:
> On Thu, 2012-03-15 at 17:11 +0200, Timo Teras wrote:
> > On Thu, 15 Mar 2012 08:06:35 +0200 Timo Teras <timo.teras@....fi>
> > wrote:
> >
> > > On Wed, 14 Mar 2012 21:53:19 +0100 Francois Romieu
> > > <romieu@...zoreil.com> wrote:
> > >
> > > > Timo Teras <timo.teras@....fi> :
> > > > [...]
> > > > > # ethtool -S eth2
> > > > > NIC statistics:
> > > > > tx_packets: 2069391193
> > > > > rx_packets: 3245815642
> > > > > tx_errors: 0
> > > > > rx_errors: 645238
> > > > > rx_missed: 31414
> > > >
> > > > It does not look like stuff for the higher layers guys.
> > > >
> > > > Can you tshark -w foobar on the sender side and
> > > > 'while : ; do sleep 1; ethtool -S eth2 >> glop; done' on the
> > > > receiver during a bad wget (a big zero filled file should
> > > > compress well).
> > >
> > > Indeed.
> > >
> > > It seems that my earlier test about the "GRO off" effect were
> > > mistaken (I used accidentally proxy, and that gave the illusion
> > > that things are working. Whoops.)
> > >
> > > So far I changed the cross-over cable and it didn't help. However,
> > > forcing the NIC to 100mbit/full-duplex mode fixes the rx_errors.
> > > It seems that something bad is happening in the gigabit mode.
> > >
> > > I wonder if it's using pause frames and that's messing things up.
> > > Seems that I can't turn it off, though.
> > >
> > > I can also double check my cables, though it is factory made
> > > Cat-5E cross-over cable; and happens with two different cables.
> >
> > Ok. So far I have two of these boxes with same r8169 hardware. Both
> > generate bad packets on transmit only; and on both 3 nic systems
> > it's the middle eth1 nic. The symptoms are identical: in 1GB mode I
> > have minor packet loss, where as 100Mbit/s mode seems to work just
> > fine.
> >
> > The first box, that I've been talking so far about, is as mentioned
> > connected to another similar box. The r8169 there reports rx_errors.
> > The cable is ok; I've tried with two different ones.
> >
> > The other broken box is connected to a HP ProCurve 4202vl-48G, and
> > the switch is reporting drops due to FCS Rx errors.
> >
> > So I have two broken pieces of hardware, or there is a driver bug.
> >
> > I'll try upgrading my kernel to 3.0.x series on the sender box and
> > see if it's fixing anything. Suggestions for further testing would
> > be appreciated.
>
> r8169 has to make an additional copy of incoming frames, because of
> hardware flaw and security requirements.
>
> This was added in 2.6.37 or 2.6.38, dont remember exactly.
>
> So your cpu might be to slow to handle the load at 1Gb speed.
>
> If you have one flow, there is nothing to do, but if your workload has
> several flows and your machine is SMP, you can try RPS/RFS as
> documented in Documentation/networking/scaling.txt
No. It's exactly the same amount of traffic on link: approx
50-80mbit/s. If link is in 100mbit/s mode, everything is perfect. But
if link is in 1gbit/s mode (but having only the 50-80mbit/s in average),
it's getting packet loss (and kills TCP performance).
There is definitely a hardware or a driver issue.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists