lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHXsExy+zm+twpC9Qrs9myBre+5s_ApGzOYU45Pt=sw-FyOn1w@mail.gmail.com>
Date: Wed, 10 Jul 2024 16:45:55 -0400
From: Jason Zhou <jasonzhou@...om>
To: netdev@...r.kernel.org
Cc: Benjamin Mahler <bmahler@...om>
Subject: PROBLEM: Issue with setting veth MAC address being unreliable.

[1.] One line summary of the problem:

Issue with setting veth address being unreliable.

[2.] Full description of the problem/report:

Hello!

We have been investigating a strange behavior within Apache Mesos
where after setting the MAC address on a veth device to the same
address as our eth0 MAC address, the change is sometimes not reflected
appropriately despite the ioctl call succeeding (~4% of the time in
our testing). Note that we also tried using libnl to set the MAC
address but the issue still persists.

Included below is the github link to the section where we set the veth
address, to clarify what we were trying to do. We first create the
veth pair [1] using a libnl function [2], then we set the veth device
MAC addresses to that of our host public interface (eth0) [3] using a
function called setMAC. Inside the setMAC [4] is where we are
observing the aforementioned issue with unreliable setting of veth
addresses..

This behavior was observed when re-fetching the MAC address on said
veth device after we made the function call to set its MAC address. We
have observed this issue on CentOS 9 only, but not on CentOS 7. We
have tried Linux kernels 5.15.147, 5.15.160 & 5.15.161 for CentOS 9,
CentOS 7 was using 5.10, but we also tried upgrading the Centos 7 host
to 5.15.160 but could not reproduce the bug.

We were re-fetching the addresses via the ioctl SIOCGIFHWADDR syscall
as well as via getifaddr (which appears to use netlink under the
covers), and, in problematic cases, both functions reported
discrepancies from the target MAC address we were initially setting
to. We also performed a fetch before we set the MAC addresses and
found that there are instances where getifaddr and ioctl results do
not match for our veth device *even before we perform any setting of
the MAC address*. It's also worth noting that after setting the MAC
address: there are no cases where ioctl or getifaddr come back with
the same MAC address as before we set the address. So, the set
operation always seems to have an effect.

Observed scenarios with incorrectly assigned MAC addresses:

(1) After setting the mac address: ioctl returns the correct MAC
address, but the results from getifaddr, returns an incorrect MAC
address (different from the original value before setting as well!)

(2) After setting the MAC address: both ioctl and getifaddr return the
same MAC address, but are both wrong (and different from the original
one!)

(3) There is a possibility that the MAC address we set ends up
overwritten by a garbage value *after* we have already updated the MAC
address, and checked that the MAC address was set correctly. Since
this error happens after this function has finished, we cannot log nor
detect it in the function where we set the MAC address because we have
not yet studied at what point this late overwriting of MAC address
occurs. It’s worth noting that this is the rarest scenario that we
have encountered, and we were only able to reproduce it in our testing
cluster machine, not in any of the production cluster machines.

[3.] Keywords:

networking, veth, kernel, MAC, netlink

[X.] Other notes, patches, fixes, workarounds:

Notes:

More specific kernel and environment information will be available on
request for security reasons, please let us know if you are interested
and we will be happy to provide you with the necessary information.

We have observed this behavior only on CentOS 9 systems at the moment,
CentOS 7 systems under various kernels do not seem to have the issue
(which is quite strange if this was purely a kernel bug).

We have tried kernels 5.15.147, 5.15.160, 5.15.161, all of these have
this issue on CentOS 9.

We have also tried rewriting our function for setting MAC address to
use libnl rather than ioctl to perform the MAC address setting, but it
did not eliminate the issue.

To work around this bug, we checked that the MAC address is set
correctly after the ioctl set call, and retry the address setting if
necessary. In our testing, this workaround appears to remedy scenarios
(1) and (2) above, but it does not address scenario (3).  You can see
it here:

https://github.com/apache/mesos/commit/8b202bbebdc89429ad82c6983aa1c514eb1b8d95

We would greatly appreciate any insights or guidance on this matter.
Please let me know if you need further information or if there are any
specific tests we should run to assist in diagnosing the issue. Again,
specific details for the production machines on which we encountered
this error can be provided upon request, so please let us know if
there is anything we can provide to help.

Thank you for your time and assistance.

Best regards,
Jason Zhou
Software Engineering Intern
jasonzhou@...om

embedded links:
[1] https://github.com/apache/mesos/blob/8cf287778371c13ee7e88fa428424b3c0fbc7ff0/src/slave/containerizer/mesos/isolators/network/port_mapping.cpp#L3599
[2] https://github.com/apache/mesos/blob/8cf287778371c13ee7e88fa428424b3c0fbc7ff0/src/linux/routing/link/veth.cpp#L45
[3] https://github.com/apache/mesos/blob/8cf287778371c13ee7e88fa428424b3c0fbc7ff0/src/slave/containerizer/mesos/isolators/network/port_mapping.cpp#L3628
[4] https://github.com/apache/mesos/blob/8cf287778371c13ee7e88fa428424b3c0fbc7ff0/src/linux/routing/link/link.cpp#L283

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ