lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 5 May 2016 13:16:37 -0400
From:	Doug Ledford <dledford@...hat.com>
To:	Saeed Mahameed <saeedm@....mellanox.co.il>
Cc:	Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: mlx5 core/en oops in 4.6-rc6+

On 05/05/2016 12:42 PM, Saeed Mahameed wrote:
> On Thu, May 5, 2016 at 7:00 PM, Doug Ledford <dledford@...hat.com> wrote:
>> Just had this pop up during testing, happened very soon after bootup:
>>

[ snip oops ]

> Hi Doug,
> 
> did you by change configure  TC queues for the netdev ? i.e. dev->num_tc > 1
> if not i would be happy to get more info in you network configuration.

That depends on which interface actually generated the oops.  If it was
the base interface, then I don't manually set any special params on it.
If it's one of the vlan interfaces, then there is a NetworkManager
dispatcher script that is intended to set the tc count on interface up:

[root@...a-virt-03 ~]$ more /etc/NetworkManager/dispatcher.d/98-mlx5_roce.4*
::::::::::::::
/etc/NetworkManager/dispatcher.d/98-mlx5_roce.43-egress.conf
::::::::::::::
#!/bin/sh
interface=$1
status=$2
[ "$interface" = mlx5_roce.43 ] || exit 0
case $status in
up)
	tc qdisc add dev mlx5_roce root mqprio num_tc 8 map 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5
	# tc_wrap.py -i mlx5_roce -u 5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
	;;
esac
--More--(Next file:
/etc/NetworkManager/dispatcher.d/98-mlx5_roce.45-egress.conf::::::::::::::
/etc/NetworkManager/dispatcher.d/98-mlx5_roce.45-egress.conf
::::::::::::::
#!/bin/sh
interface=$1
status=$2
[ "$interface" = mlx5_roce.45 ] || exit 0
case $status in
up)
	tc qdisc add dev mlx5_roce root mqprio num_tc 8 map 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5
	# tc_wrap.py -i mlx5_roce -u 5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
	;;
esac
[root@...a-virt-03 ~]$


However, I should note that this usage of tc is a bit out of date last I
checked and doesn't even work any more.  Let me double check...

[root@...a-virt-02 vlan]$ cd /proc/net/vlan/
[root@...a-virt-02 vlan]$ ls
config  mlx5_roce.43  mlx5_roce.45
[root@...a-virt-02 vlan]$
[root@...a-virt-02 vlan]$ for i in *; do echo "$i:"; cat $i; echo; done
config:
VLAN Dev name	 | VLAN ID
Name-Type: VLAN_NAME_TYPE_RAW_PLUS_VID_NO_PAD
mlx5_roce.45   | 45  | mlx5_roce
mlx5_roce.43   | 43  | mlx5_roce

mlx5_roce.43:
mlx5_roce.43  VID: 43	 REORDER_HDR: 1  dev->priv_flags: 1001
         total frames received           57
          total bytes received         5010
      Broadcast/Multicast Rcvd            0

      total frames transmitted           20
       total bytes transmitted         2525
Device: mlx5_roce
INGRESS priority mappings: 0:0  1:0  2:0  3:0  4:0  5:0  6:0 7:0
 EGRESS priority mappings: 0:3 1:3 2:3 3:3 4:3 5:3 6:3 7:3

mlx5_roce.45:
mlx5_roce.45  VID: 45	 REORDER_HDR: 1  dev->priv_flags: 1001
         total frames received           57
          total bytes received         5010
      Broadcast/Multicast Rcvd            0

      total frames transmitted           21
       total bytes transmitted         2603
Device: mlx5_roce
INGRESS priority mappings: 0:0  1:0  2:0  3:0  4:0  5:0  6:0 7:0
 EGRESS priority mappings: 0:5 1:5 2:5 3:5 4:5 5:5 6:5 7:5

OK, so the vlans have egress mappings, but they don't match what the
mlx5_roce.43 egress.conf file should have enabled.  Digging a little
further on this machine:

[root@...a-virt-03 vlan]$ more
/etc/sysconfig/network-scripts/ifcfg-mlx5_roce.4?
::::::::::::::
/etc/sysconfig/network-scripts/ifcfg-mlx5_roce.43
::::::::::::::
DEVICE=mlx5_roce.43
VLAN=yes
VLAN_ID=43
VLAN_EGRESS_PRIORITY_MAP=0:3,1:3,2:3,3:3,4:3,5:3,6:3,7:3
TYPE=Vlan
ONBOOT=yes
BOOTPROTO=dhcp
DEFROUTE=no
PEERDNS=no
PEERROUTES=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=no
IPV6_PEERDNS=no
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=mlx5_roce.43
::::::::::::::
/etc/sysconfig/network-scripts/ifcfg-mlx5_roce.45
::::::::::::::
DEVICE=mlx5_roce.45
VLAN=yes
VLAN_ID=45
VLAN_EGRESS_PRIORITY_MAP=0:5,1:5,2:5,3:5,4:5,5:5,6:5,7:5
TYPE=Vlan
ONBOOT=yes
BOOTPROTO=dhcp
DEFROUTE=no
PEERDNS=no
PEERROUTES=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=no
IPV6_PEERDNS=no
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=mlx5_roce.45
[root@...a-virt-03 vlan]$

This is a Fedora rawhide machine, using NetworkManager to handle the
network interfaces.  So, the egress priority mappings are being set by
NM.  I don't know if they are overriding the egress mapping dispatchers
or if the egress mapping dispatchers are failing to work/run properly.
It might be the latter.  Let me double check the command...

OK, re-reading the egress dispatchers above, they work on the base
interface, not on the vlan interface that triggers them.  That's why
they both use the same command (mapping to egress 5) instead of being
like the ifcfg files, which map the 43 vlan to egress priority 3, and
the 45 vlan to egress priority 5.  Running tc qdisc | grep mlx5_roce
shows that the egress mapping is being applied (although I'm not sure it
should be...I made that mapping many kernels ago when that was the right
thing to do, the modern mlx5 ethernet drivers create their own mappings
that are drastically different).

So, to answer your question, yes, num_tc > 1, num_tc == 8, and I
probably need to reconfigure that egress dispatcher to do what I want it
to do (which is merely to make sure that all packets from specific
interfaces are tagged with specific vlan priorities so per-priority flow
control between the card and switch works properly, the base interface
is supposed to have no priority tag, the 43 vlan is supposed to have
priority tag 3, and vlan 45 is supposed to have priority tag 5) on
modern kernels.

-- 
Doug Ledford <dledford@...hat.com>
              GPG KeyID: 0E572FDD



Download attachment "signature.asc" of type "application/pgp-signature" (885 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ