netdev - Re: [PATCH] firewire: net: rate-limit log spam at transmit failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4CD68925.8080302@s5r6.in-berlin.de>
Date:	Sun, 07 Nov 2010 12:10:29 +0100
From:	Stefan Richter <stefanr@...6.in-berlin.de>
To:	Maxim Levitsky <maximlevitsky@...il.com>
CC:	linux1394-devel@...ts.sourceforge.net,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH] firewire: net: rate-limit log spam at transmit failure

Maxim Levitsky wrote:
> On Sun, 2010-11-07 at 00:23 +0100, Stefan Richter wrote:
>> On  6 Nov, Stefan Richter wrote:
>>> Then I tried an XIO2213A card in the AMD PC (again the Intel PC as peer)
>>> and got 243 times "failed: 12" i.e. RCODE_BUSY and 81 times "failed: 10"
>>> i.e. RCODE_SEND_ERROR during ftp transfer of a >500 MB large file from
>>> XIO2213A to FW323.
> 
> I also am getting strange results (but very good compared to what I had
> recently).
> 
> With all your patches, I get very stable TCP and UDP streams from laptop
> to desktop at 180~190 Mbits/s.
> 
> However, the opposite direction (desktop->laptop) still suffers from
> tlabel exhaustion.
> I added some printks, and I see, clearly that netif_stop_queue doesn't
> always work (probably this is intended?).
> 
> If I replace == with >= in inc_queue_packets and similar in
> dec_queued_packets, then tlabel exhaustion disappears, and I get ~240
> Mbit/s on TCP and UDP.

Remind me, is this FireWire 800?  And what controllers in particular?  I get
about half of your numbers with FireWire 400 connections.

The == vs. >= is a good hint.  If .ndo_start_xmit can be entered by multiple
CPUs, the upper limit will clearly exceeded eventually.

With >= instead of ==, the same test as that quoted above gives 71x RCODE_BUSY
+ 0x RCODE_SEND_ERROR, and 59x RCODE_BUSY + 0x RCODE_SEND_ERROR in a
repetition.  (0x + 0x in the other direction.)  There were no RCODE_CANCELLED
occurrences, which I had occasionally in the past.

I then tried

	if (dev->queued_packets >= FWNET_MAX_QUEUED_PACKETS)
		return NETDEV_TX_BUSY;

at the top of fwnet_tx but it did not change the amount of RCODE_BUSY, which
is not too surprising.  So next I should have a look at the responder side again.

BTW, FireWire 400 CardBus controllers usually feature a limitation of max_rec
= 1024 (maximum size of asynchronous packets they can receive).  Incidentally,
the VT6306 card that I used in my other tests from yesterday is one of those.
So, since link fragmentation is quite common due to this kind of cards, I
should perhaps count queued fragments instead of queued datagrams.

> UDP transfers work quite well, tested for few minutes.
> TCP transfers unfortunelly trigger (probably a hardware) bug in notebook
> OHCI controller (I have seen that meny times so far.)
> 
> Transfer just stops, and controller goes south.
> If I unload the firewire-ohci, then when I load it:
> 
> [ 2062.632532] firewire_ohci 0000:07:00.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
> [ 2072.650173] firewire_ohci: Failed to reset ohci card.
> [ 2072.650267] firewire_ohci 0000:07:00.0: PCI INT A disabled
> [ 2072.650314] firewire_ohci: probe of 0000:07:00.0 failed with error -16
> 
> 
> Only suspend to ram helps bring it back from that state.

On the bright side, s2ram fixes things for once instead of breaking them...
-- 
Stefan Richter
-=====-==-=- =-== --===
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html