linux-kernel - Re: [PATCH] net/cls_cgroup: Fix task_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7d2f3767-64c5-4efa-862b-f463751a03bb@iogearbox.net>
Date: Tue, 19 Aug 2025 14:42:07 +0200
From: Daniel Borkmann <daniel@...earbox.net>
To: Yafang Shao <laoar.shao@...il.com>, davem@...emloft.net,
 edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
 Thomas Graf <tgraf@...g.ch>, Paul McKenney <paulmck@...nel.org>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 "open list:BPF [RINGBUF]" <bpf@...r.kernel.org>,
 Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH] net/cls_cgroup: Fix task_get_classid() during qdisc run

On 8/19/25 11:37 AM, Yafang Shao wrote:
> During recent testing with the netem qdisc to inject delays into TCP
> traffic, we observed that our CLS BPF program failed to function correctly
> due to incorrect classid retrieval from task_get_classid(). The issue
> manifests in the following call stack:
> 
>          bpf_get_cgroup_classid+5
>          cls_bpf_classify+507
>          __tcf_classify+90
>          tcf_classify+217
>          __dev_queue_xmit+798
>          bond_dev_queue_xmit+43
>          __bond_start_xmit+211
>          bond_start_xmit+70
>          dev_hard_start_xmit+142
>          sch_direct_xmit+161
>          __qdisc_run+102             <<<<< Issue location
>          __dev_xmit_skb+1015
>          __dev_queue_xmit+637
>          neigh_hh_output+159
>          ip_finish_output2+461
>          __ip_finish_output+183
>          ip_finish_output+41
>          ip_output+120
>          ip_local_out+94
>          __ip_queue_xmit+394
>          ip_queue_xmit+21
>          __tcp_transmit_skb+2169
>          tcp_write_xmit+959
>          __tcp_push_pending_frames+55
>          tcp_push+264
>          tcp_sendmsg_locked+661
>          tcp_sendmsg+45
>          inet_sendmsg+67
>          sock_sendmsg+98
>          sock_write_iter+147
>          vfs_write+786
>          ksys_write+181
>          __x64_sys_write+25
>          do_syscall_64+56
>          entry_SYSCALL_64_after_hwframe+100
> 
> The problem occurs when multiple tasks share a single qdisc. In such cases,
> __qdisc_run() may transmit skbs created by different tasks. Consequently,
> task_get_classid() retrieves an incorrect classid since it references the
> current task's context rather than the skb's originating task.
> 
> Given that dev_queue_xmit() always executes with bh disabled, we can safely
> use in_softirq() instead of in_serving_softirq() to properly identify the
> softirq context and obtain the correct classid.
> 
> Signed-off-by: Yafang Shao <laoar.shao@...il.com>
> Cc: Daniel Borkmann <daniel@...earbox.net>
> Cc: Thomas Graf <tgraf@...g.ch>
> ---
>   include/net/cls_cgroup.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
> index 7e78e7d6f015..fc9e0617a73c 100644
> --- a/include/net/cls_cgroup.h
> +++ b/include/net/cls_cgroup.h
> @@ -63,7 +63,7 @@ static inline u32 task_get_classid(const struct sk_buff *skb)
>   	 * calls by looking at the number of nested bh disable calls because
>   	 * softirqs always disables bh.
>   	 */
> -	if (in_serving_softirq()) {
> +	if (in_softirq()) {
>   		struct sock *sk = skb_to_full_sk(skb);
>   
>   		/* If there is an sock_cgroup_classid we'll use that. */

Hm, essentially you only want to use the fallback method of retrieving cgroup from
the socket when the dev_queue_xmit() was triggered via ksoftirqd, rather than from
a process via syscall all the way into dev_queue_xmit(). It gets more fuzzy presumably
when skbs are queued somewhere and then some other kthread calls the dev_queue_xmit().

Looking at in_softirq(), the comment says "the following macros are deprecated and
should not be used in new code", see commit 15115830c887 ("preempt: Cleanup the
macro maze a bit"). Maybe Sebastian or Paul has input on whether in_softirq() is
still supposed to be used.

Side question, did you investigate where the skb->sk association got orphaned
somewhere along the way?

Thanks,
Daniel