lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231107230822.371443-22-ankur.a.arora@oracle.com>
Date:   Tue,  7 Nov 2023 15:08:14 -0800
From:   Ankur Arora <ankur.a.arora@...cle.com>
To:     linux-kernel@...r.kernel.org
Cc:     tglx@...utronix.de, peterz@...radead.org,
        torvalds@...ux-foundation.org, paulmck@...nel.org,
        linux-mm@...ck.org, x86@...nel.org, akpm@...ux-foundation.org,
        luto@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com,
        hpa@...or.com, mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, willy@...radead.org, mgorman@...e.de,
        jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        jgross@...e.com, andrew.cooper3@...rix.com, mingo@...nel.org,
        bristot@...nel.org, mathieu.desnoyers@...icios.com,
        geert@...ux-m68k.org, glaubitz@...sik.fu-berlin.de,
        anton.ivanov@...bridgegreys.com, mattst88@...il.com,
        krypton@...ich-teichert.org, rostedt@...dmis.org,
        David.Laight@...LAB.COM, richard@....at, mjguzik@...il.com,
        Ankur Arora <ankur.a.arora@...cle.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        David Ahern <dsahern@...nel.org>,
        Pablo Neira Ayuso <pablo@...filter.org>,
        Jozsef Kadlecsik <kadlec@...filter.org>,
        Florian Westphal <fw@...len.de>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>
Subject: [RFC PATCH 78/86] treewide: net: remove cond_resched()

There are broadly three sets of uses of cond_resched():

1.  Calls to cond_resched() out of the goodness of our heart,
    otherwise known as avoiding lockup splats.

2.  Open coded variants of cond_resched_lock() which call
    cond_resched().

3.  Retry or error handling loops, where cond_resched() is used as a
    quick alternative to spinning in a tight-loop.

When running under a full preemption model, the cond_resched() reduces
to a NOP (not even a barrier) so removing it obviously cannot matter.

But considering only voluntary preemption models (for say code that
has been mostly tested under those), for set-1 and set-2 the
scheduler can now preempt kernel tasks running beyond their time
quanta anywhere they are preemptible() [1]. Which removes any need
for these explicitly placed scheduling points.

The cond_resched() calls in set-3 are a little more difficult.
To start with, given it's NOP character under full preemption, it
never actually saved us from a tight loop.
With voluntary preemption, it's not a NOP, but it might as well be --
for most workloads the scheduler does not have an interminable supply
of runnable tasks on the runqueue.

So, cond_resched() is useful to not get softlockup splats, but not
terribly good for error handling. Ideally, these should be replaced
with some kind of timed or event wait.
For now we use cond_resched_stall(), which tries to schedule if
possible, and executes a cpu_relax() if not.

All the uses here are in set-1 (some right after we give up a lock
or enable bottom-halves, causing an explicit preemption check.)

We can remove all of them.

[1] https://lore.kernel.org/lkml/20231107215742.363031-1-ankur.a.arora@oracle.com/

Cc: "David S. Miller" <davem@...emloft.net> 
Cc: Eric Dumazet <edumazet@...gle.com> 
Cc: Jakub Kicinski <kuba@...nel.org> 
Cc: Paolo Abeni <pabeni@...hat.com> 
Cc: David Ahern <dsahern@...nel.org> 
Cc: Pablo Neira Ayuso <pablo@...filter.org> 
Cc: Jozsef Kadlecsik <kadlec@...filter.org> 
Cc: Florian Westphal <fw@...len.de> 
Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com> 
Cc: Jamal Hadi Salim <jhs@...atatu.com> 
Cc: Cong Wang <xiyou.wangcong@...il.com> 
Cc: Jiri Pirko <jiri@...nulli.us> 
Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
---
 net/core/dev.c                  | 4 ----
 net/core/neighbour.c            | 1 -
 net/core/net_namespace.c        | 1 -
 net/core/netclassid_cgroup.c    | 1 -
 net/core/rtnetlink.c            | 1 -
 net/core/sock.c                 | 2 --
 net/ipv4/inet_connection_sock.c | 3 ---
 net/ipv4/inet_diag.c            | 1 -
 net/ipv4/inet_hashtables.c      | 1 -
 net/ipv4/inet_timewait_sock.c   | 1 -
 net/ipv4/inetpeer.c             | 1 -
 net/ipv4/netfilter/arp_tables.c | 2 --
 net/ipv4/netfilter/ip_tables.c  | 3 ---
 net/ipv4/nexthop.c              | 1 -
 net/ipv4/tcp_ipv4.c             | 2 --
 net/ipv4/udp.c                  | 2 --
 net/netlink/af_netlink.c        | 1 -
 net/sched/sch_api.c             | 3 ---
 net/socket.c                    | 2 --
 19 files changed, 33 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 9f3f8930c691..467715278307 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6304,7 +6304,6 @@ void napi_busy_loop(unsigned int napi_id,
 			if (!IS_ENABLED(CONFIG_PREEMPT_RT))
 				preempt_enable();
 			rcu_read_unlock();
-			cond_resched();
 			if (loop_end(loop_end_arg, start_time))
 				return;
 			goto restart;
@@ -6709,8 +6708,6 @@ static int napi_threaded_poll(void *data)
 
 			if (!repoll)
 				break;
-
-			cond_resched();
 		}
 	}
 	return 0;
@@ -11478,7 +11475,6 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
 	rtnl_lock();
 	list_for_each_entry(net, net_list, exit_list) {
 		default_device_exit_net(net);
-		cond_resched();
 	}
 
 	list_for_each_entry(net, net_list, exit_list) {
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index df81c1f0a570..86584a2ace2f 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1008,7 +1008,6 @@ static void neigh_periodic_work(struct work_struct *work)
 		 * grows while we are preempted.
 		 */
 		write_unlock_bh(&tbl->lock);
-		cond_resched();
 		write_lock_bh(&tbl->lock);
 		nht = rcu_dereference_protected(tbl->nht,
 						lockdep_is_held(&tbl->lock));
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f4183c4c1ec8..5533e8268b30 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -168,7 +168,6 @@ static void ops_exit_list(const struct pernet_operations *ops,
 	if (ops->exit) {
 		list_for_each_entry(net, net_exit_list, exit_list) {
 			ops->exit(net);
-			cond_resched();
 		}
 	}
 	if (ops->exit_batch)
diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c
index d6a70aeaa503..7162c3d30f1b 100644
--- a/net/core/netclassid_cgroup.c
+++ b/net/core/netclassid_cgroup.c
@@ -92,7 +92,6 @@ static void update_classid_task(struct task_struct *p, u32 classid)
 		task_lock(p);
 		fd = iterate_fd(p->files, fd, update_classid_sock, &ctx);
 		task_unlock(p);
-		cond_resched();
 	} while (fd);
 }
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 53c377d054f0..c4ff7b21f906 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -140,7 +140,6 @@ void __rtnl_unlock(void)
 		struct sk_buff *next = head->next;
 
 		kfree_skb(head);
-		cond_resched();
 		head = next;
 	}
 }
diff --git a/net/core/sock.c b/net/core/sock.c
index 16584e2dd648..c91f9fc687ba 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2982,8 +2982,6 @@ void __release_sock(struct sock *sk)
 			skb_mark_not_on_list(skb);
 			sk_backlog_rcv(sk, skb);
 
-			cond_resched();
-
 			skb = next;
 		} while (skb != NULL);
 
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 394a498c2823..49b90cf913a0 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -389,7 +389,6 @@ inet_csk_find_open_port(const struct sock *sk, struct inet_bind_bucket **tb_ret,
 		goto success;
 next_port:
 		spin_unlock_bh(&head->lock);
-		cond_resched();
 	}
 
 	offset--;
@@ -1420,8 +1419,6 @@ void inet_csk_listen_stop(struct sock *sk)
 		bh_unlock_sock(child);
 		local_bh_enable();
 		sock_put(child);
-
-		cond_resched();
 	}
 	if (queue->fastopenq.rskq_rst_head) {
 		/* Free all the reqs queued in rskq_rst_head. */
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index e13a84433413..45d3c9027355 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -1147,7 +1147,6 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *skb,
 		}
 		if (res < 0)
 			break;
-		cond_resched();
 		if (accum == SKARR_SZ) {
 			s_num = num + 1;
 			goto next_chunk;
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 598c1b114d2c..47f86ce00704 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -1080,7 +1080,6 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 		goto ok;
 next_port:
 		spin_unlock_bh(&head->lock);
-		cond_resched();
 	}
 
 	offset++;
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index dd37a5bf6881..519c77bc15ec 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -288,7 +288,6 @@ void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family)
 	for (slot = 0; slot <= hashinfo->ehash_mask; slot++) {
 		struct inet_ehash_bucket *head = &hashinfo->ehash[slot];
 restart_rcu:
-		cond_resched();
 		rcu_read_lock();
 restart:
 		sk_nulls_for_each_rcu(sk, node, &head->chain) {
diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index e9fed83e9b3c..d32a70c27cbe 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -300,7 +300,6 @@ void inetpeer_invalidate_tree(struct inet_peer_base *base)
 		p = rb_next(p);
 		rb_erase(&peer->rb_node, &base->rb_root);
 		inet_putpeer(peer);
-		cond_resched();
 	}
 
 	base->total = 0;
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 2407066b0fec..3f8c9c4f3ce0 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -622,7 +622,6 @@ static void get_counters(const struct xt_table_info *t,
 
 			ADD_COUNTER(counters[i], bcnt, pcnt);
 			++i;
-			cond_resched();
 		}
 	}
 }
@@ -642,7 +641,6 @@ static void get_old_counters(const struct xt_table_info *t,
 			ADD_COUNTER(counters[i], tmp->bcnt, tmp->pcnt);
 			++i;
 		}
-		cond_resched();
 	}
 }
 
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 7da1df4997d0..f8b7ae5106be 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -761,7 +761,6 @@ get_counters(const struct xt_table_info *t,
 
 			ADD_COUNTER(counters[i], bcnt, pcnt);
 			++i; /* macro does multi eval of i */
-			cond_resched();
 		}
 	}
 }
@@ -781,8 +780,6 @@ static void get_old_counters(const struct xt_table_info *t,
 			ADD_COUNTER(counters[i], tmp->bcnt, tmp->pcnt);
 			++i; /* macro does multi eval of i */
 		}
-
-		cond_resched();
 	}
 }
 
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index bbff68b5b5d4..d0f009aea17e 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -2424,7 +2424,6 @@ static void flush_all_nexthops(struct net *net)
 	while ((node = rb_first(root))) {
 		nh = rb_entry(node, struct nexthop, rb_node);
 		remove_nexthop(net, nh, NULL);
-		cond_resched();
 	}
 }
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 4167e8a48b60..d2542780447c 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2449,8 +2449,6 @@ static void *established_get_first(struct seq_file *seq)
 		struct hlist_nulls_node *node;
 		spinlock_t *lock = inet_ehash_lockp(hinfo, st->bucket);
 
-		cond_resched();
-
 		/* Lockless fast path for the common case of empty buckets */
 		if (empty_bucket(hinfo, st))
 			continue;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f39b9c844580..e01eca44559b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -281,7 +281,6 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 				snum += rand;
 			} while (snum != first);
 			spin_unlock_bh(&hslot->lock);
-			cond_resched();
 		} while (++first != last);
 		goto fail;
 	} else {
@@ -1890,7 +1889,6 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags,
 	kfree_skb(skb);
 
 	/* starting over for a new packet, but check if we need to yield */
-	cond_resched();
 	msg->msg_flags &= ~MSG_TRUNC;
 	goto try_again;
 }
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index eb086b06d60d..4e2ed0c5cf6e 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -843,7 +843,6 @@ static int netlink_autobind(struct socket *sock)
 	bool ok;
 
 retry:
-	cond_resched();
 	rcu_read_lock();
 	ok = !__netlink_lookup(table, portid, net);
 	rcu_read_unlock();
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index e9eaf637220e..06ec50c52ea8 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -772,7 +772,6 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
 			autohandle = TC_H_MAKE(0x80000000U, 0);
 		if (!qdisc_lookup(dev, autohandle))
 			return autohandle;
-		cond_resched();
 	} while	(--i > 0);
 
 	return 0;
@@ -923,7 +922,6 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 	u32 block_index;
 	__u32 qlen;
 
-	cond_resched();
 	nlh = nlmsg_put(skb, portid, seq, event, sizeof(*tcm), flags);
 	if (!nlh)
 		goto out_nlmsg_trim;
@@ -1888,7 +1886,6 @@ static int tc_fill_tclass(struct sk_buff *skb, struct Qdisc *q,
 	struct gnet_dump d;
 	const struct Qdisc_class_ops *cl_ops = q->ops->cl_ops;
 
-	cond_resched();
 	nlh = nlmsg_put(skb, portid, seq, event, sizeof(*tcm), flags);
 	if (!nlh)
 		goto out_nlmsg_trim;
diff --git a/net/socket.c b/net/socket.c
index c4a6f5532955..d6499c7c7869 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2709,7 +2709,6 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		++datagrams;
 		if (msg_data_left(&msg_sys))
 			break;
-		cond_resched();
 	}
 
 	fput_light(sock->file, fput_needed);
@@ -2944,7 +2943,6 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 		/* Out of band data, return right away */
 		if (msg_sys.msg_flags & MSG_OOB)
 			break;
-		cond_resched();
 	}
 
 	if (err == 0)
-- 
2.31.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ