lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 28 Mar 2014 09:34:47 +0100
From:	Sebastian Andrzej Siewior <sebastian@...akpoint.cc>
To:	Claudiu Manoil <claudiu.manoil@...escale.com>
Cc:	Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org,
	"David S. Miller" <davem@...emloft.net>
Subject: Re: [PATCH][net-next] gianfar: Simplify MQ polling to avoid soft
 lockup

On 2014-03-28 10:19:07 [+0200], Claudiu Manoil wrote:
> >My problem is that when gfar_start_xmit() is preemted after the
> >tx_queue->tx_skbuff[tx_queue->skb_curtx] is set but before the DMA is started
> >then the NAPI-poll never completes because it sees a packet which never
> >completes because the DMA engine did no start yet and won't.
> 
> False, that code section from start_xmit() cannot be preempted, because
> it has spin_lock_irqsave()/restore() around it (unless you modified
> your code).  Will check though if on SMP, for some reason,
> clean_tx_ring() enters with 0 skbs to clean.

I said on -RT. On mainline it can't be preempted as I said. If for
some reason you can't get your packet out (on a slow link as you in your
case) it will return with 0 cleanups.
This has been broken since c233cf4 ("gianfar: Fix tx napi polling")
since you drop the return value.

> [...]
> 
> >To fix properly with something that works on -RT and mainline I suggest
> >to revert this patch and add the following:
> 
> This patch cannot be reverted. (why would you?)
Because it does not fix a thing it simply duck tapes the issue that a TX
transfer does not cleanup a thing and you assume that it did something.
You have budget a reserved for RX cleanup which you do not use up if possible.
You simple do one loop and leave.

> This patch fixes the issue from description.  I'm seeing no issues with
> P1010 now (on any kind of traffic), and the openwrt/tp-link guys also
> confirmed (on the powerpc list) that this patch addresses the issue on
> their end.
Simply because the stall is gone doesn't make it good. As you had no
idea why.

> If you encounter problems with the latest driver code, please submit a
> proper issue description indicating the code base you're using and so
> on.  Also make sure that the problem you're seeing wasn't already fixed
> by one of the latest gianfar fixes from net-next:
> http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git

I pointed out _why_ you saw the stall and the fix involved not to
endless loop on TX clean up on yet transmitted packages. The removal of
outer loop was not required. The issue is present since c233cf4 which
made it in v3.10 into the kernel. 

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ