lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 30 Nov 2020 23:53:22 +0200
From:   Vladimir Oltean <olteanv@...il.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     Stephen Hemminger <stephen@...workplumber.org>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        Paul Gortmaker <paul.gortmaker@...driver.com>,
        Jiri Benc <jbenc@...hat.com>,
        Or Gerlitz <ogerlitz@...lanox.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>
Subject: Re: Correct usage of dev_base_lock in 2020

On Mon, Nov 30, 2020 at 10:46:00PM +0100, Eric Dumazet wrote:
> You can not use dev_base_lock() or RCU and call an ndo_get_stats64()
> that could sleep.
>
> You can not for example start changing bonding, since bond_get_stats()
> could be called from non-sleepable context (net/core/net-procfs.c)
>
> I am still referring to your patch adding :
>
> +       if (!rtnl_locked)
> +               rtnl_lock();
>
> This is all I said.

Ah, ok, well I didn't show you all the patches, did I?

-----------------------------[cut here]-----------------------------
>From d62c65ef6cb357e1b8c5a4ab189718e157a569ae Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@....com>
Date: Thu, 26 Nov 2020 23:08:57 +0200
Subject: [PATCH] net: procfs: retrieve device statistics under RTNL, not RCU

In the effort of making .ndo_get_stats64 be able to sleep, we need to
ensure the callers of dev_get_stats do not use atomic context.

The /proc/net/dev file uses an RCU read-side critical section to ensure
the integrity of the list of network interfaces, because it iterates
through all net devices in the netns to show their statistics.
We still need some protection against an interface registering or
deregistering, and the writer-side lock, the RTNL mutex, is fine for
that, because it offers sleepable context.

Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
---
 net/core/net-procfs.c                   | 11 ++++++-----
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
index c714e6a9dad4..1429e8c066d8 100644
--- a/net/core/net-procfs.c
+++ b/net/core/net-procfs.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/netdevice.h>
 #include <linux/proc_fs.h>
+#include <linux/rtnetlink.h>
 #include <linux/seq_file.h>
 #include <net/wext.h>
 
@@ -21,7 +22,7 @@ static inline struct net_device *dev_from_same_bucket(struct seq_file *seq, loff
 	unsigned int count = 0, offset = get_offset(*pos);
 
 	h = &net->dev_index_head[get_bucket(*pos)];
-	hlist_for_each_entry_rcu(dev, h, index_hlist) {
+	hlist_for_each_entry(dev, h, index_hlist) {
 		if (++count == offset)
 			return dev;
 	}
@@ -51,9 +52,9 @@ static inline struct net_device *dev_from_bucket(struct seq_file *seq, loff_t *p
  *	in detail.
  */
 static void *dev_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(RCU)
+	__acquires(rtnl_mutex)
 {
-	rcu_read_lock();
+	rtnl_lock();
 	if (!*pos)
 		return SEQ_START_TOKEN;
 
@@ -70,9 +71,9 @@ static void *dev_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 }
 
 static void dev_seq_stop(struct seq_file *seq, void *v)
-	__releases(RCU)
+	__releases(rtnl_mutex)
 {
-	rcu_read_unlock();
+	rtnl_unlock();
 }
 
 static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev)
-----------------------------[cut here]-----------------------------

and:

-----------------------------[cut here]-----------------------------
>From 0c51569116b3844d0d99831697a8e4134814d50e Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@....com>
Date: Mon, 30 Nov 2020 02:51:01 +0200
Subject: [PATCH] net: sysfs: retrieve device statistics unlocked

In the effort of making .ndo_get_stats64 be able to sleep, we need to
ensure the callers of dev_get_stats do not use atomic context.

I need to preface this by saying that I have no idea why netstat_show
takes the dev_base_lock rwlock. Two things are for certain:
(a) it's not due to dev_isalive requiring it for some mystical reason,
    because broadcast_show() also calls dev_isalive() and has had no
    problem existing since the beginning of git.
(b) the dev_get_stats function definitely does not need dev_base_lock
    protection either. In fact, holding the dev_base_lock is the entire
    problem here, because we want to make dev_get_stats sleepable, and
    holding a rwlock gives us atomic context.

So since no protection seems to be necessary, just run unlocked while
retrieving the /sys/class/net/eth0/statistics/* values.

Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
---
 net/core/net-sysfs.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 999b70c59761..0782a476b424 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -585,14 +585,13 @@ static ssize_t netstat_show(const struct device *d,
 	WARN_ON(offset > sizeof(struct rtnl_link_stats64) ||
 		offset % sizeof(u64) != 0);
 
-	read_lock(&dev_base_lock);
 	if (dev_isalive(dev)) {
 		struct rtnl_link_stats64 temp;
 		const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp);
 
 		ret = sprintf(buf, fmt_u64, *(u64 *)(((u8 *)stats) + offset));
 	}
-	read_unlock(&dev_base_lock);
+
 	return ret;
 }
 
-----------------------------[cut here]-----------------------------

In all fairness, I think there is precedent in the kernel for changing
so much RCU-protected code to use RTNL. I can't recall the exact link
now, except for this one:
https://patchwork.ozlabs.org/project/netdev/patch/1410306738-18036-2-git-send-email-xiyou.wangcong@gmail.com/,
but you and Cong have changed a lot of RCU-protected accesses in IPv6,
because the read-side critical section was taking too much time and was
not sleepable/preemptible.

Now, I do agree that there's only so much we can keep adding to the RTNL
mutex. I guess somebody needs to start the migration towards a different
mutex. I'll prepare some patches then, and rework the ones that I have.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ