lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231018154804.420823-1-atenart@kernel.org>
Date: Wed, 18 Oct 2023 17:47:42 +0200
From: Antoine Tenart <atenart@...nel.org>
To: davem@...emloft.net,
	kuba@...nel.org,
	pabeni@...hat.com,
	edumazet@...gle.com
Cc: Antoine Tenart <atenart@...nel.org>,
	netdev@...r.kernel.org,
	gregkh@...uxfoundation.org,
	mhocko@...e.com,
	stephen@...workplumber.org
Subject: [RFC PATCH net-next 0/4] net-sysfs: remove rtnl_trylock/restart_syscall use

Hi,

This is sent as an RFC because I believe this should be discussed (and
some might want to do additional testing), but the code itself is ready.

Some time ago we tried to improve the rtnl_trylock/restart_syscall
situation[1]. What happens is when there is rtnl contention, userspace
accessing net sysfs attributes will spin and experience delays. This can
happen in different situations, when sysfs attributes are accessed
(networking daemon, configuration, monitoring) while operations under
rtnl are performed (veth creation, driver configuration, etc). A few
improvements can be done in userspace to ease things, like using the
netlink interface instead, or polling less (or more selectively) the
attributes; but in the end the root cause is always there and this keeps
happening from time to time.

That initial effort however wasn't successful, although I think there
was an interest, mostly because we found technical flaws and didn't find
a working solution at the time. Some time later, we gave it a new try
and found something more promising, but the patches fell off my radar. I
recently had another look at this series, made more tests and cleaned it
up.

The technical aspect is described in patch 1 directly in the code
comments, with an additional important comment in patch 3. This was
mostly tested by stress-testing net sysfs attributes (read/write ops)
while adding/removing queues and adding/removing veths, all in parallel.

All comments are welcomed.

Thanks,
Antoine

[1] https://lore.kernel.org/all/20210928125500.167943-1-atenart@kernel.org/T/

Antoine Tenart (4):
  net-sysfs: remove rtnl_trylock from device attributes
  net-sysfs: move queue attribute groups outside the default groups
  net-sysfs: prevent uncleared queues from being re-added
  net-sysfs: remove rtnl_trylock from queue attributes

 include/linux/netdevice.h     |   1 +
 include/net/netdev_rx_queue.h |   1 +
 net/core/net-sysfs.c          | 329 ++++++++++++++++++++++++----------
 3 files changed, 237 insertions(+), 94 deletions(-)

-- 
2.41.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ