lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 24 May 2023 13:12:58 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: netdev@...r.kernel.org
Cc: Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>,
	Kurt Kanzenbach <kurt.kanzenbach@...utronix.de>,
	Paolo Abeni <pabeni@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: [RFC PATCH 1/2] net: Add sysfs files for threaded NAPI.

I've been looking into threaded NAPI. One awkward thing to do is
to figure out the thread names, pids in order to adjust the thread
priorities and SMP affinity.
On PREEMPT_RT the NAPI thread is treated (by the user) the same way as
the threaded interrupt which means a dedicate CPU affinity for the
thread and a higher task priority to be favoured over other tasks on the
CPU. Otherwise the NAPI thread can be preempted by other threads leading
to delays in packet delivery.
Having to run ps/ grep is awkward to get the PID right. It is not easy
to match the interrupt since there is no obvious relation between the
IRQ and the NAPI thread.
NAPI threads are enabled often to mitigate the problems caused by a
"pending" ksoftirqd (which has been mitigated recently by doing softiqrs
regardless of ksoftirqd status). There is still the part that the NAPI
thread does not use softnet_data::poll_list.

To make things easier to setup NAPI threads here is a sysfs interfaces.
It provides for each NAPI instance a folder containing the name and PID
of the NAPI thread and an interrupt number of the interrupt scheduling
the NAPI thread. The latter requires support from the driver.
The name of the napi-instance can also be set by driver so it does not
fallback to the NAPI-id.

I've been thinking to wire up task affinity to follow the affinity of
the interrupt thread. While this would require some extra work, it
shouldn't be needed since the PID of the NAPI thread and interrupt
number is exposed so the user may use chrt/ taskset to adjust the
priority accordingly and the interrupt affinity does not change
magically.

Having said all that, there is still no generic solution to the
"overload" problem. Part of the problem is lack of policy since the
offload to ksoftirqd is not welcomed by everyone. Also "better" cards
support filtering by ether type which allows to filter the problematic
part to another NAPI instance avoiding the prbolem.

This is what the structure looks with the igb driver after adding the
name/ irq hints (second patch).

| root@box:/sys/class/net# cd eno0
| root@box:/sys/class/net/eno0# ls -l napi
| total 0

Empty before threaded NAPI is enabled.

| root@box:/sys/class/net/eno0# echo 1 > threaded
| root@box:/sys/class/net/eno0# ls -l napi
| total 0
| drwxr-xr-x 2 root root 0 May 24 09:42 eno0-TxRx-0
| drwxr-xr-x 2 root root 0 May 24 09:42 eno0-TxRx-1
| drwxr-xr-x 2 root root 0 May 24 09:42 eno0-TxRx-2
| drwxr-xr-x 2 root root 0 May 24 09:42 eno0-TxRx-3
| drwxr-xr-x 2 root root 0 May 24 09:42 eno0-TxRx-4
| drwxr-xr-x 2 root root 0 May 24 09:42 eno0-TxRx-5
| drwxr-xr-x 2 root root 0 May 24 09:42 eno0-TxRx-6
| drwxr-xr-x 2 root root 0 May 24 09:42 eno0-TxRx-7

Deployed using names supplied by the driver which map the names which
are used for the IRQ.

| root@box:/sys/class/net/eno0# grep . napi/*/*
| napi/eno0-TxRx-0/interrupt:37
| napi/eno0-TxRx-0/name:napi/eno0-8193
| napi/eno0-TxRx-0/pid:2253
| napi/eno0-TxRx-1/interrupt:38
| napi/eno0-TxRx-1/name:napi/eno0-8194
| napi/eno0-TxRx-1/pid:2252
| napi/eno0-TxRx-2/interrupt:39
| napi/eno0-TxRx-2/name:napi/eno0-8195
| napi/eno0-TxRx-2/pid:2251
| napi/eno0-TxRx-3/interrupt:40
| napi/eno0-TxRx-3/name:napi/eno0-8196
| napi/eno0-TxRx-3/pid:2250
| napi/eno0-TxRx-4/interrupt:41
| napi/eno0-TxRx-4/name:napi/eno0-8197
| napi/eno0-TxRx-4/pid:2249
| napi/eno0-TxRx-5/interrupt:42
| napi/eno0-TxRx-5/name:napi/eno0-8198
| napi/eno0-TxRx-5/pid:2248
| napi/eno0-TxRx-6/interrupt:43
| napi/eno0-TxRx-6/name:napi/eno0-8199
| napi/eno0-TxRx-6/pid:2247
| napi/eno0-TxRx-7/interrupt:44
| napi/eno0-TxRx-7/name:napi/eno0-8200
| napi/eno0-TxRx-7/pid:2246

Thread name, pid and interrupt number as provided by the driver.

| root@box:/sys/class/net/eno0# grep eno0-TxRx-7 /proc/interrupts | sed 's@ \+@ @g'
|  44: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 123 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:07:00.0 8-edge eno0-TxRx-7
| root@box:/sys/class/net/eno0# cat /proc/irq/44/smp_affinity_list
| 0-7,16-23
| root@box:/sys/class/net/eno0# cat /proc/irq/44/effective_affinity_list
| 18

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
---
 include/linux/netdevice.h |   6 ++
 net/core/dev.c            |  13 ++++
 net/core/net-sysfs.c      | 137 ++++++++++++++++++++++++++++++++++++++
 net/core/net-sysfs.h      |  12 ++++
 4 files changed, 168 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a937b9329af52..34b584b4d5d94 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -376,6 +376,9 @@ struct napi_struct {
 	/* control-path-only fields follow */
 	struct list_head	dev_list;
 	struct hlist_node	napi_hash_node;
+	struct kobject		kobj;
+	const char		*napi_name;
+	int			interrupt_num;
 };
 
 enum {
@@ -2411,6 +2414,7 @@ struct net_device {
 	struct rtnl_hw_stats64	*offload_xstats_l3;
 
 	struct devlink_port	*devlink_port;
+	struct kset		*napi_kset;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -2688,6 +2692,8 @@ static inline void netif_napi_del(struct napi_struct *napi)
 	synchronize_net();
 }
 
+void netif_napi_add_hints(struct napi_struct *napi, const char *name, unsigned int interrupt);
+
 struct packet_type {
 	__be16			type;	/* This is really htons(ether_type). */
 	bool			ignore_outgoing;
diff --git a/net/core/dev.c b/net/core/dev.c
index 318ae441df1f5..79789dbc1c521 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1375,7 +1375,10 @@ static int napi_kthread_create(struct napi_struct *n)
 		pr_err("kthread_run failed with err %d\n", err);
 		n->thread = NULL;
 	}
+	if (err)
+		return err;
 
+	err = napi_thread_add_kobj(n);
 	return err;
 }
 
@@ -6383,6 +6386,7 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
 	list_add_rcu(&napi->dev_list, &dev->napi_list);
 	napi_hash_add(napi);
 	napi_get_frags_check(napi);
+	napi->interrupt_num = -1;
 	/* Create kthread for this napi if dev->threaded is set.
 	 * Clear dev->threaded if kthread creation failed so that
 	 * threaded mode will not be enabled in napi_enable().
@@ -6392,6 +6396,14 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
 }
 EXPORT_SYMBOL(netif_napi_add_weight);
 
+void netif_napi_add_hints(struct napi_struct *napi, const char *name,
+			  unsigned int interrupt)
+{
+	napi->napi_name = name;
+	napi->interrupt_num = interrupt;
+}
+EXPORT_SYMBOL_GPL(netif_napi_add_hints);
+
 void napi_disable(struct napi_struct *n)
 {
 	unsigned long val, new;
@@ -6464,6 +6476,7 @@ void __netif_napi_del(struct napi_struct *napi)
 	napi->gro_bitmask = 0;
 
 	if (napi->thread) {
+		napi_thread_remove_kobj(napi);
 		kthread_stop(napi->thread);
 		napi->thread = NULL;
 	}
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 15e3f4606b5f9..a050f8cd4913c 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1820,6 +1820,136 @@ static int register_queue_kobjects(struct net_device *dev)
 	return error;
 }
 
+#ifdef CONFIG_SYSFS
+
+struct napi_thread_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct napi_struct *n, char *buf);
+	ssize_t (*store)(struct napi_struct *n, const char *buf, size_t len);
+};
+
+static ssize_t napi_thread_interrupt_show(struct napi_struct *n, char *buf)
+{
+	if (n->interrupt_num < 0)
+		return -EINVAL;
+	return sysfs_emit(buf, "%u\n", n->interrupt_num);
+}
+
+static ssize_t napi_thread_name_show(struct napi_struct *n, char *buf)
+{
+	char comm_buf[TASK_COMM_LEN];
+
+	get_task_comm(comm_buf, n->thread);
+	return sysfs_emit(buf, "%s\n", comm_buf);
+}
+
+static ssize_t napi_thread_pid_show(struct napi_struct *n, char *buf)
+{
+	return sysfs_emit(buf, "%d\n", task_pid_nr(n->thread));
+}
+
+#define NAPI_THREAD_ATTR(__name)	\
+	static struct napi_thread_attribute thread_napi_##__name##_attribute __ro_after_init =	\
+	__ATTR(__name, 0444, napi_thread_##__name##_show, NULL)
+
+NAPI_THREAD_ATTR(interrupt);
+NAPI_THREAD_ATTR(name);
+NAPI_THREAD_ATTR(pid);
+
+static struct attribute *napi_thread_default_attrs[] __ro_after_init = {
+	&thread_napi_interrupt_attribute.attr,
+	&thread_napi_name_attribute.attr,
+	&thread_napi_pid_attribute.attr,
+	NULL,
+};
+ATTRIBUTE_GROUPS(napi_thread_default);
+
+#define to_napi_struct_attr(_attr) \
+	container_of(_attr, struct napi_thread_attribute, attr)
+
+#define to_napi_struct(obj) container_of(obj, struct napi_struct, kobj)
+
+static ssize_t napi_thread_attr_show(struct kobject *kobj, struct attribute *attr,
+				     char *buf)
+{
+	const struct napi_thread_attribute *attribute = to_napi_struct_attr(attr);
+	struct napi_struct *n = to_napi_struct(kobj);
+	ssize_t ret = -EINVAL;
+
+	if (!attribute->show)
+		return -EIO;
+
+	if (!rtnl_trylock())
+		return restart_syscall();
+	if (dev_isalive(n->dev))
+		ret = attribute->show(n, buf);
+
+	rtnl_unlock();
+	return ret;
+}
+
+static const struct sysfs_ops napi_thread_sysfs_ops = {
+	.show = napi_thread_attr_show,
+};
+
+static void napi_thread_release(struct kobject *kobj)
+{
+	memset(kobj, 0, sizeof(*kobj));
+}
+
+static const struct kobj_type napi_ktype = {
+	.sysfs_ops = &napi_thread_sysfs_ops,
+	.release = napi_thread_release,
+	.default_groups = napi_thread_default_groups,
+};
+
+int napi_thread_add_kobj(struct napi_struct *n)
+{
+	struct kobject *kobj;
+	char napi_name[32];
+	const char *name_ptr;
+	int ret;
+
+	if (n->napi_name) {
+		name_ptr = n->napi_name;
+	} else {
+		name_ptr = napi_name;
+		scnprintf(napi_name, sizeof(napi_name), "napi_id-%d", n->napi_id);
+	}
+	kobj = &n->kobj;
+	kobj->kset = n->dev->napi_kset;
+	ret = kobject_init_and_add(kobj, &napi_ktype, NULL,
+				   name_ptr);
+	return ret;
+}
+
+void napi_thread_remove_kobj(struct napi_struct *n)
+{
+	kobject_put(&n->kobj);
+}
+
+static int register_napi_kobjects(struct net_device *dev)
+{
+	dev->napi_kset = kset_create_and_add("napi",
+					     NULL, &dev->dev.kobj);
+	if (!dev->napi_kset)
+		return -ENOMEM;
+	return 0;
+}
+
+static int unregister_napi_kobjects(struct net_device *dev)
+{
+	kset_unregister(dev->napi_kset);
+	return 0;
+}
+
+#else /* !CONFIG_SYSFS */
+
+static int register_napi_kobjects(struct net_device *dev) { return 0; }
+static void unregister_napi_kobjects(struct net_device *dev) { }
+
+#endif
+
 static int queue_change_owner(struct net_device *ndev, kuid_t kuid, kgid_t kgid)
 {
 	int error = 0, real_rx = 0, real_tx = 0;
@@ -2009,6 +2139,7 @@ void netdev_unregister_kobject(struct net_device *ndev)
 	kobject_get(&dev->kobj);
 
 	remove_queue_kobjects(ndev);
+	unregister_napi_kobjects(ndev);
 
 	pm_runtime_set_memalloc_noio(dev, false);
 
@@ -2050,6 +2181,12 @@ int netdev_register_kobject(struct net_device *ndev)
 		return error;
 	}
 
+	error = register_napi_kobjects(ndev);
+	if (error) {
+		remove_queue_kobjects(ndev);
+		device_del(dev);
+		return -ENOMEM;
+	}
 	pm_runtime_set_memalloc_noio(dev, true);
 
 	return error;
diff --git a/net/core/net-sysfs.h b/net/core/net-sysfs.h
index 8a5b04c2699aa..6b185a309290d 100644
--- a/net/core/net-sysfs.h
+++ b/net/core/net-sysfs.h
@@ -11,4 +11,16 @@ int netdev_queue_update_kobjects(struct net_device *net,
 int netdev_change_owner(struct net_device *, const struct net *net_old,
 			const struct net *net_new);
 
+#ifdef CONFIG_SYSFS
+
+int napi_thread_add_kobj(struct napi_struct *n);
+void napi_thread_remove_kobj(struct napi_struct *n);
+
+#else
+
+static inline int napi_thread_add_kobj(struct napi_struct *n) { return 0; }
+static inline void napi_thread_remove_kobj(struct napi_struct *n) { }
+
+#endif
+
 #endif
-- 
2.40.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ