lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 19 Nov 2020 20:37:27 -0500
From:   Limin Wang <lwang.nbl@...il.com>
To:     netdev@...r.kernel.org
Subject: LRO: creating vlan subports affects parent port's LRO settings

Under relatively recent kernels (v4.4+), creating a vlan subport on a
LRO supported parent NIC may turn LRO off on the parent port and
further render its LRO feature practically unchangeable.

This can be easily reproduced on different distros, and independent of
NIC vendors.
Hopefully, this is not a repeat post of a known issue.

Below example is on Ubuntu 18.04 LTS. (Centos-7.6 is slightly
different, but the end result is the same, will attach in the end)
===========================================================================
# Ubuntu 18.04 LTS
root@...ver1:# uname -a
Linux server1 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC
2018 x86_64 x86_64 x86_64 GNU/Linux

# mellanox NIC
root@...ver1:# /sbin/ethtool -i ens4f0
driver: mlx5_core
version: 5.0-2.1.8

# enable LRO on the NIC
root@...ver1:# /sbin/ethtool -k ens4f0 | grep large
large-receive-offload: off
root@...ver1:# /sbin/ethtool -K ens4f0 lro on
root@...ver1:# /sbin/ethtool -k ens4f0 | grep large
large-receive-offload: on

# create a vlan subport, once subport is up, parent port LRO is disabled
root@...ver1:# ip link add link ens4f0 name ens4f0.50 type vlan id 50
root@...ver1:# ifconfig ens4f0.50 up
root@...ver1:# ethtool -k ens4f0.50 | grep large
large-receive-offload: off [fixed]
root@...ver1:# ethtool -k ens4f0 | grep large
large-receive-offload: off

# manually enabling LRO on parent port not working any more
root@...ver1:# /sbin/ethtool -K ens4f0 lro on
Could not change any device features
root@...ver1:# /sbin/ethtool -K ens4f0.50 lro on
Cannot change large-receive-offload
Could not change any device features
root@...ver1:# /sbin/ethtool -K ens4f0 lro on
Could not change any device features
root@...ver1:# ethtool -k ens4f0 | grep large
large-receive-offload: off [requested on]

# Now the only way to re-enable LRO on the parent port is to remove the subport
root@...ver1:# ip link del ens4f0.50
root@...ver1:# /sbin/ethtool -k ens4f0 | grep large
large-receive-offload: off [requested on]
root@...ver1:# /sbin/ethtool -K ens4f0 lro on
root@...ver1:# ethtool -k ens4f0 | grep large
large-receive-offload: on
===========================================================================

Although LRO may have different implications or issues in practice,
this seems a simple use case expected to work?--enabling LRO on the
physical NIC and also having vlans on the same NIC port.
Note, here both the parent port and the vlan subport are not attached
to any bridge, bond, team or ovs devices, just standalone.

This issue seems not driver or distro related, and lies in the kernel
network stack.
When changing netdev features, (via either userspace ethtool, or other
in-kernel processing), in the end:
__netdev_update_features() does the job and calls
netdev_sync_upper_features() and netdev_sync_lower_features()
both sync functions basically do one thing: make sure
NETIF_F_UPPER_DISABLES is consistently enforced among upper and lower
net devices.
currently NETIF_F_UPPER_DISABLES only includes NETIF_F_LRO

A lot of thoughts must have been given to this logic, and many
situations are considered for upper_devs like bond, team, bridge etc.
However, maybe a possible oversight is vlan_dev, which is an upper_dev
for its parent real_dev?
A vlan_dev is created with LRO unsupported by default, (NETIF_F_LRO
bit not set in hw_features).
As seen "fixed" in
root@...ver1:# ethtool -k ens4f0.50 | grep large
large-receive-offload: off [fixed]

Therefore, following the code path of upper_sync and lower_sync above,
once a vlan_dev is created, the parent real_dev can no longer set LRO
on.

Honestly, vlan_dev being treated as an upper_dev for the real_dev is a
bit counter-intuitive at the beginning, as people call them vlan
subports.
But, from the perspective that vlan_dev is a virtual device created
out of real_dev, it has somewhat "upper_dev" flavor, similar to
bond/team devices.
Kernel also associates upper_dev with some "master" role, and it makes
perfect sense for bond/team/bridge/ovs.
However, for vlan_dev, it sounds more like a slave dev to real_dev
(some people call real_dev parent port).
A secondary point, upper_dev (bond/team/bridge) typically has > 1
lower_dev, upper:lower normally has 1:N relationship.
For vlan_dev, it has only 1 lower_dev, upper:lower could often be N:1
relationship.

The above upper/lower sync logic probably stems from the "master" role
aspect of upper_dev, just that vlan_dev may not be a good fit for
this.
Probably that is where the confusion is.

Maybe I missed something, but this logic has been there for quite some
time (since v4.4 onwards, didn't try the latest, but tried pre-v4.4
kernels, no such issue under older kernels though).

Feel free to correct me.

Now, two possible solution proposals to fix this (if considered as an issue)
1. when creating/init a vlan_dev, set its hw_feature's NETIF_F_LRO bit
based on its underlying real_dev's hw_feature NETIF_F_LRO bit.
  (maybe not just hw_features, set wanted_feature as well?)
2. in netdev_sync_upper_features() and netdev_sync_lower_features()
exclude those upper_dev that is also a vlan_dev

Thanks for the attention.
Limin

p.s. another example of Centos-7.6 with VMXNET3 port
===========================================================================
# CentOS Linux release 7.6.1810 (Core)
root@...i-server]# uname -a
Linux esxi-server 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29
17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

# VMXNET3 NIC
[root@...i-server]# ethtool -i ens224
driver: vmxnet3
version: 1.4.14.0-k-NAPI

# LRO enabled on the NIC
[root@...i-server]# ethtool -k ens224 | grep large
large-receive-offload: on

# create a vlan subport, NIC LRO still on
[root@...i-server]# ip link add link ens224 name ens224.50 type vlan id 50
[root@...i-server]# ifconfig ens224.50 up
[root@...i-server]# ethtool -k ens224 | grep large
large-receive-offload: on
[root@...i-server]# ethtool -k ens224.50 | grep large
large-receive-offload: off [fixed]

# now turn LRO off, and after that, LRO cannot be turned on any longer
[root@...i-server]# ethtool -K ens224 lro off
[root@...i-server]# ethtool -k ens224 | grep large
large-receive-offload: off
[root@...i-server]# ethtool -k ens224.50 | grep large
large-receive-offload: off [fixed]
[root@...i-server]# ethtool -K ens224 lro on
Could not change any device features
[root@...i-server]# ethtool -k ens224 | grep large
large-receive-offload: off [requested on]
[root@...i-server]# ethtool -k ens224.50 | grep large
large-receive-offload: off [fixed]

# Now the only way to re-enable LRO on the parent port is to remove the subport
[root@...i-server]# ip link del ens224.50
[root@...i-server]# ethtool -k ens224 | grep large
large-receive-offload: off [requested on]
[root@...i-server]# ethtool -K ens224 lro on
[root@...i-server]# ethtool -k ens224 | grep large
large-receive-offload: on
===========================================================================

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ