netdev - Re: b44: Reset due to FIFO overflow.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1277788678.4235.1285.camel@edumazet-laptop>
Date:	Tue, 29 Jun 2010 07:17:58 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Mitchell Erblich <erblichs@...thlink.net>
Cc:	James Courtier-Dutton <james.dutton@...il.com>,
	netdev@...r.kernel.org
Subject: Re: b44: Reset due to FIFO overflow.

Le lundi 28 juin 2010 à 14:21 -0700, Mitchell Erblich a écrit :
> On Jun 28, 2010, at 4:09 AM, Eric Dumazet wrote:
> 
> > Le lundi 28 juin 2010 à 11:17 +0100, James Courtier-Dutton a écrit :
> >> On 28 June 2010 11:00, Eric Dumazet <eric.dumazet@...il.com> wrote:
> >>> 
> >>> Problem is we receive a spike of RX network frames (possibly UDP or some
> >>> other RX only trafic), and chip raises an RX fifo overflow _error_
> >>> indication.
> >>> 
> 
> IMO, spikes are a normal behaviour.

Yes, this is why I said NIC is buggy, if it requires a reset (lasting a
_very_ long time) on a normal condition.

> 
> >> 
> >> The cause of the RX overflow is in my case is TCP.
> >> It is reproducible in mythtv.
> >> While watching LiveTV, press "s" for the program guide.
> >> The program guide is implemented into mythtv by a SQL query that
> >> results in a large response.
> >> The kernel is probably not servicing the RX FIFO quickly enough due to
> >> it being busy doing something else. In this case, probably a video
> >> mode switch.
> >> 
> > 
> > Thats strange, b44 has a big RX ring... and tcp sender should wait for
> > ACK...
> > 
> 
> Slow start, etc SHOULD/CAN  double the number of in-flight segments in each
> next round-trip, placing them back to back.
> 

rx ring buffer is about 200 frames on b44. One single tcp flow should
fit.

Limit is 511. James, did you try to increase rx ring ?

ethtool -G eth0 rx 511

> IMO,  a stress test, would be a large number/wirespeed set of pings?
> 

Better is to use frames that are going to slow down receiver.
Say multicast trafic with 100 receivers on same multicast group.
Send 1000 consecutive frames, last ones will trigger RX overflow,
because softirq handler cannot be fast enough.

Ping is answered by kernel, its pretty fast.

> >>> Some hardware are buggy enough that such error indication is fatal and
> >>> _require_ hardware reset. Thats life. I suspect b44 driver doing a full
> >>> reset is not a random guess from driver author, but to avoid a complete
> >>> NIC lockup.
> >>> 
> >> 
> >> Interesting, which hardware, apart from the b44, is it that "requires"
> >> a hardware reset after a RX FIFO overflow.
> > 
> > Just take a look at some net drivers and you'll see some of them have
> > this requirement.
> > 
> > rtl8169_rx_interrupt()
> > ...
> > 	if (status & RxFOVF) {
> > 		rtl8169_schedule_work(dev, rtl8169_reset_task);
> > 		dev->stats.rx_fifo_errors++;
> > 	}
> > 
> > 
> > 
> > 
> 
> 
> If they can reset in say X frame loss units, then why not reset if
> X is an acceptable number?
> 

Because a reset is an exception. While card is reset, we lose many tx
and rx frames and this should be the very last thing to consider.

Why not a complete reboot of the host while we are at it ?

> And a hammer may fix the dent, while I may be more
> interested in preventing the dent in the first place.

So ? Please submit an alternative firmware for this NIC, or provide
another NIC on thousand of machines that are stuck with it.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html