netdev - Re: [PATCH 3/5] net: mvneta: do not schedule in mvneta_tx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140114153342.GC32193@1wt.eu>
Date:	Tue, 14 Jan 2014 16:33:42 +0100
From:	Willy Tarreau <w@....eu>
To:	Ben Hutchings <ben@...adent.org.uk>
Cc:	davem@...emloft.net, netdev@...r.kernel.org,
	Thomas Petazzoni <thomas.petazzoni@...e-electrons.com>,
	Gregory CLEMENT <gregory.clement@...e-electrons.com>
Subject: Re: [PATCH 3/5] net: mvneta: do not schedule in mvneta_tx_timeout

Hi Ben,

On Sun, Jan 12, 2014 at 05:38:53PM +0000, Ben Hutchings wrote:
> I think this will DTRT, but it's compile-tested only.  I have been given
> an OpenBlocks AX3 but haven't set it up yet.

OK I just managed to test your patch. I managed to force a Tx timeout by
forcing the link to 100/half and transfering 1000 concurrent streams.

Unfortunately for now the patch doesn't manage to recover, and the system
randomly panics one or two seconds after the link is brought up. Twice the
system did not panic but I lost all communications until a down/up cycle,
after which a panic happened during transfers.

However I could verify that the scheduled function is correctly called. I
suspect that something else might be wrong in the driver's reset sequence
(eg: unmapping pages still in use by the NIC or I don't know what), but
your patch does exactly what it's supposed to do.

At least, if the restart function does not do anything, everything works
fine. I see that the function is called (I added printk there) and the
transfer is not perturbated at all anymore.

So now I'm wondering whether the right thing should not be to just keep
your scheduled function and make it only log that a timeout was caught.

Another point which bothers me is that I suspect we're triggering Tx
timeouts too fast, because I regularly get these on 100 Mbps during
regular traffic (which ended up in immediate panics with previous code).

Thanks,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html