[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250822064200.38149-1-laoar.shao@gmail.com>
Date: Fri, 22 Aug 2025 14:42:00 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com,
horms@...nel.org,
daniel@...earbox.net,
bigeasy@...utronix.de,
tgraf@...g.ch,
paulmck@...nel.org
Cc: netdev@...r.kernel.org,
bpf@...r.kernel.org,
Yafang Shao <laoar.shao@...il.com>
Subject: [PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc run
During recent testing with the netem qdisc to inject delays into TCP
traffic, we observed that our CLS BPF program failed to function correctly
due to incorrect classid retrieval from task_get_classid(). The issue
manifests in the following call stack:
bpf_get_cgroup_classid+5
cls_bpf_classify+507
__tcf_classify+90
tcf_classify+217
__dev_queue_xmit+798
bond_dev_queue_xmit+43
__bond_start_xmit+211
bond_start_xmit+70
dev_hard_start_xmit+142
sch_direct_xmit+161
__qdisc_run+102 <<<<< Issue location
__dev_xmit_skb+1015
__dev_queue_xmit+637
neigh_hh_output+159
ip_finish_output2+461
__ip_finish_output+183
ip_finish_output+41
ip_output+120
ip_local_out+94
__ip_queue_xmit+394
ip_queue_xmit+21
__tcp_transmit_skb+2169
tcp_write_xmit+959
__tcp_push_pending_frames+55
tcp_push+264
tcp_sendmsg_locked+661
tcp_sendmsg+45
inet_sendmsg+67
sock_sendmsg+98
sock_write_iter+147
vfs_write+786
ksys_write+181
__x64_sys_write+25
do_syscall_64+56
entry_SYSCALL_64_after_hwframe+100
The problem occurs when multiple tasks share a single qdisc. In such cases,
__qdisc_run() may transmit skbs created by different tasks. Consequently,
task_get_classid() retrieves an incorrect classid since it references the
current task's context rather than the skb's originating task.
Given that dev_queue_xmit() always executes with bh disabled, we can safely
use in_softirq() instead of in_serving_softirq() to properly identify the
softirq context and obtain the correct classid.
The simple steps to reproduce this issue:
1. Add network delay to the network interface:
such as: tc qdisc add dev bond0 root netem delay 1.5ms
2. Create two distinct net_cls cgroups, each running a network-intensive task
3. Initiate parallel TCP streams from both tasks to external servers.
Under this specific condition, the issue reliably occurs. The kernel
eventually dequeues an SKB that originated from Task-A while executing in
the context of Task-B.
Signed-off-by: Yafang Shao <laoar.shao@...il.com>
Cc: Daniel Borkmann <daniel@...earbox.net>
Cc: Thomas Graf <tgraf@...g.ch>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
---
v1->v2: use softirq_count() instead of in_softirq()
---
include/net/cls_cgroup.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
index 7e78e7d6f015..668aeee9b3f6 100644
--- a/include/net/cls_cgroup.h
+++ b/include/net/cls_cgroup.h
@@ -63,7 +63,7 @@ static inline u32 task_get_classid(const struct sk_buff *skb)
* calls by looking at the number of nested bh disable calls because
* softirqs always disables bh.
*/
- if (in_serving_softirq()) {
+ if (softirq_count()) {
struct sock *sk = skb_to_full_sk(skb);
/* If there is an sock_cgroup_classid we'll use that. */
--
2.43.5
Powered by blists - more mailing lists