lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHTA-uZDaJ-71o+bo8a96TV4ck-8niimztQFaa=QoeNdUm-9wg@mail.gmail.com>
Date: Wed, 11 Sep 2024 17:20:29 -0500
From: Mitchell Augustin <mitchell.augustin@...onical.com>
To: "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Jiri Pirko <jiri@...nulli.us>, 
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Lorenzo Bianconi <lorenzo@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Jacob Martin <jacob.martin@...onical.com>, dann frazier <dann.frazier@...onical.com>
Subject: Namespaced network devices not cleaned up properly after execution of
 pmtu.sh kernel selftest

Hello,

We recently identified a bug still impacting upstream, triggered
occasionally by one of the kernel selftests (net/pmtu.sh) that
sometimes causes the following behavior:
* One of this tests's namespaced network devices does not get properly
cleaned up when the namespace is destroyed, evidenced by
`unregister_netdevice: waiting for veth_A-R1 to become free. Usage
count = 5` appearing in the dmesg output repeatedly
* Once we start to see the above `unregister_netdevice` message, an
un-cancelable hang will occur on subsequent attempts to run `modprobe
ip6_vti` or `rmmod ip6_vti`

Jacob and I have both investigated various conditions under which this
bug state does / does not occur, which is documented more thoroughly
in the following BugLink:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072501

We expect that veth_A-R1's refcount should be cleaned up by the time
execution of pmtu.sh finishes since the relevant namespaces are
deleted during cleanup of the test suite. We've observed this behavior
on several kernels, at least as old as stable branches like
linux-6.1.y and as recent as v6.11-rc6, so this does not seem like a
new regression. (did not have a chance to test on rc7 yet).

This issue also only occurs very infrequently, and reproducibility is
extremely susceptible to very minor timing variations in the pmtu.sh
test case. (in fact, I was unable to reproduce the bug with the
version of pmtu.sh and lib.sh in v6.11-rc6 - not because the kernel is
unaffected (it is affected, as confirmed by running an older kernel's
pmtu.sh on it), but because v6.11-rc6 introduces some unrelated
functional changes to the tests that cause a slightly longer test
execution time.)
It is also difficult to reproduce the bug on slower CPUs, or even on
faster CPUs where the cpufreq scaling governor is not set to
`performance`.

However, I can easily reproduce the issue on an Nvidia Grace/Hopper
machine (and other platforms with modern CPUs) with the performance
governor set by doing the following:
* Install/boot any affected kernel
* Clone the kernel tree just to get an older version of the test cases
without subtle timing changes that mask the issue (such as
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/tree/?h=Ubuntu-6.8.0-39.39)
* cd tools/testing/selftests/net
* while true; do sudo ./pmtu.sh pmtu_ipv6_ipv6_exception; done

If running on an appropriately fast CPU, you should start seeing
`unregister_netdevice: waiting for veth_A-R1 to become free. Usage
count = 5` in dmesg at some point. (On Grace/Hopper, it happens in
under a minute, reliably). After that point, attempts to interact with
ip6_vti will hang.

Please let me know if there is any other info I can provide to assist
in debugging this.

Thanks,
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ