lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 6 Jul 2016 06:42:57 +0000
From:	Yuval Mintz <Yuval.Mintz@...gic.com>
To:	Saeed Mahameed <saeedm@....mellanox.co.il>,
	Eric Dumazet <eric.dumazet@...il.com>
CC:	Saeed Mahameed <saeedm@...lanox.com>,
	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>,
	Eric Dumazet <edumazet@...gle.com>,
	"Tom Herbert" <tom@...bertland.com>,
	Mohamad Haj Yahia <mohamad@...lanox.com>
Subject: RE: [PATCH net] net: poll tx timeout only on active tx queues

> >> > currently all the device driver call
> >> > netif_tx_start_all_queues(dev) on open to W/A this issue. which is
> >> > strange since only real_num_tx_queues are active.
> >>
> >> You could also argue that netif_tx_start_all_queues() should only
> >> enable the real_num_tx_queues.
> >> [Although that would obviously cause all drivers to reach the
> >> 'problem' you're currently fixing].
> >
> > Yep. Basically what I pointed out.
> >
> > It seems inconsistent to have loops using num_tx_queues, and others
> > using real_num_tx_queues.
> >
> > Instead of 'fixing' one of them, we should take a deeper look, even if
> > the change looks fine.
> >
> > num_tx_queues should be used in code that runs once, like
> > netdev_lockdep_set_classes(), but other loops should probably use
> > real_num_tx_queues.
> >
> > Anyway all these changes should definitely target net-next, not net
> > tree.
> >
> 
> But for the long term, you have a point.
> We will consider a deeper fix for net-next as you suggested, and drop this
> temporary fix.

I think we've actually managed to hit an issue with qede [& modified bnx2x]
due to netif_tx_start_all_queues() starting all Tx-queues - 
While reducing the number of channels on an interface driver reloads
following which the xmit function receives an SKB using a too-high txq.

Investigation seem to indicate that some TCP traffic arrived during the
reload, got enqueued on the qdisc with high txq and then got transmitted
as-is after re-enabling tx.
[Removing the modulo from bnx2x's select_queue() lead to same issue.]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ