[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87303e90-3c74-4e4f-8fac-2001d82b90d8@blackwall.org>
Date: Fri, 22 Aug 2025 10:26:41 +0300
From: Nikolay Aleksandrov <razor@...ckwall.org>
To: Yafang Shao <laoar.shao@...il.com>, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org,
daniel@...earbox.net, bigeasy@...utronix.de, tgraf@...g.ch,
paulmck@...nel.org
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc
run
On 8/22/25 09:42, Yafang Shao wrote:
> During recent testing with the netem qdisc to inject delays into TCP
> traffic, we observed that our CLS BPF program failed to function correctly
> due to incorrect classid retrieval from task_get_classid(). The issue
> manifests in the following call stack:
>
> bpf_get_cgroup_classid+5
> cls_bpf_classify+507
> __tcf_classify+90
> tcf_classify+217
> __dev_queue_xmit+798
> bond_dev_queue_xmit+43
> __bond_start_xmit+211
> bond_start_xmit+70
> dev_hard_start_xmit+142
> sch_direct_xmit+161
> __qdisc_run+102 <<<<< Issue location
> __dev_xmit_skb+1015
> __dev_queue_xmit+637
> neigh_hh_output+159
> ip_finish_output2+461
> __ip_finish_output+183
> ip_finish_output+41
> ip_output+120
> ip_local_out+94
> __ip_queue_xmit+394
> ip_queue_xmit+21
> __tcp_transmit_skb+2169
> tcp_write_xmit+959
> __tcp_push_pending_frames+55
> tcp_push+264
> tcp_sendmsg_locked+661
> tcp_sendmsg+45
> inet_sendmsg+67
> sock_sendmsg+98
> sock_write_iter+147
> vfs_write+786
> ksys_write+181
> __x64_sys_write+25
> do_syscall_64+56
> entry_SYSCALL_64_after_hwframe+100
>
> The problem occurs when multiple tasks share a single qdisc. In such cases,
> __qdisc_run() may transmit skbs created by different tasks. Consequently,
> task_get_classid() retrieves an incorrect classid since it references the
> current task's context rather than the skb's originating task.
>
> Given that dev_queue_xmit() always executes with bh disabled, we can safely
> use in_softirq() instead of in_serving_softirq() to properly identify the
> softirq context and obtain the correct classid.
>
nit: you are no longer using in_softirq() in v2, you should update the
commit message as well.
[snip]
Cheers,
Nik
Powered by blists - more mailing lists