netdev - RE: [PATCH net] net: poll tx timeout only on active tx queues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CO2PR11MB0088E7024FA29C309D20B655973A0@CO2PR11MB0088.namprd11.prod.outlook.com>
Date:	Wed, 6 Jul 2016 06:42:57 +0000
From:	Yuval Mintz <Yuval.Mintz@...gic.com>
To:	Saeed Mahameed <saeedm@....mellanox.co.il>,
	Eric Dumazet <eric.dumazet@...il.com>
CC:	Saeed Mahameed <saeedm@...lanox.com>,
	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>,
	Eric Dumazet <edumazet@...gle.com>,
	"Tom Herbert" <tom@...bertland.com>,
	Mohamad Haj Yahia <mohamad@...lanox.com>
Subject: RE: [PATCH net] net: poll tx timeout only on active tx queues

> >> > currently all the device driver call
> >> > netif_tx_start_all_queues(dev) on open to W/A this issue. which is
> >> > strange since only real_num_tx_queues are active.
> >>
> >> You could also argue that netif_tx_start_all_queues() should only
> >> enable the real_num_tx_queues.
> >> [Although that would obviously cause all drivers to reach the
> >> 'problem' you're currently fixing].
> >
> > Yep. Basically what I pointed out.
> >
> > It seems inconsistent to have loops using num_tx_queues, and others
> > using real_num_tx_queues.
> >
> > Instead of 'fixing' one of them, we should take a deeper look, even if
> > the change looks fine.
> >
> > num_tx_queues should be used in code that runs once, like
> > netdev_lockdep_set_classes(), but other loops should probably use
> > real_num_tx_queues.
> >
> > Anyway all these changes should definitely target net-next, not net
> > tree.
> >
> 
> But for the long term, you have a point.
> We will consider a deeper fix for net-next as you suggested, and drop this
> temporary fix.

I think we've actually managed to hit an issue with qede [& modified bnx2x]
due to netif_tx_start_all_queues() starting all Tx-queues - 
While reducing the number of channels on an interface driver reloads
following which the xmit function receives an SKB using a too-high txq.

Investigation seem to indicate that some TCP traffic arrived during the
reload, got enqueued on the qdisc with high txq and then got transmitted
as-is after re-enabling tx.
[Removing the modulo from bnx2x's select_queue() lead to same issue.]