lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALOAHbDoT8kmfbM9EnRcLP2o+2YpgN6ktn+p3UJMCeA=bOFopA@mail.gmail.com>
Date: Fri, 22 Aug 2025 15:34:10 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: Nikolay Aleksandrov <razor@...ckwall.org>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, 
	pabeni@...hat.com, horms@...nel.org, daniel@...earbox.net, 
	bigeasy@...utronix.de, tgraf@...g.ch, paulmck@...nel.org, 
	netdev@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc run

On Fri, Aug 22, 2025 at 3:26 PM Nikolay Aleksandrov <razor@...ckwall.org> wrote:
>
> On 8/22/25 09:42, Yafang Shao wrote:
> > During recent testing with the netem qdisc to inject delays into TCP
> > traffic, we observed that our CLS BPF program failed to function correctly
> > due to incorrect classid retrieval from task_get_classid(). The issue
> > manifests in the following call stack:
> >
> >         bpf_get_cgroup_classid+5
> >         cls_bpf_classify+507
> >         __tcf_classify+90
> >         tcf_classify+217
> >         __dev_queue_xmit+798
> >         bond_dev_queue_xmit+43
> >         __bond_start_xmit+211
> >         bond_start_xmit+70
> >         dev_hard_start_xmit+142
> >         sch_direct_xmit+161
> >         __qdisc_run+102             <<<<< Issue location
> >         __dev_xmit_skb+1015
> >         __dev_queue_xmit+637
> >         neigh_hh_output+159
> >         ip_finish_output2+461
> >         __ip_finish_output+183
> >         ip_finish_output+41
> >         ip_output+120
> >         ip_local_out+94
> >         __ip_queue_xmit+394
> >         ip_queue_xmit+21
> >         __tcp_transmit_skb+2169
> >         tcp_write_xmit+959
> >         __tcp_push_pending_frames+55
> >         tcp_push+264
> >         tcp_sendmsg_locked+661
> >         tcp_sendmsg+45
> >         inet_sendmsg+67
> >         sock_sendmsg+98
> >         sock_write_iter+147
> >         vfs_write+786
> >         ksys_write+181
> >         __x64_sys_write+25
> >         do_syscall_64+56
> >         entry_SYSCALL_64_after_hwframe+100
> >
> > The problem occurs when multiple tasks share a single qdisc. In such cases,
> > __qdisc_run() may transmit skbs created by different tasks. Consequently,
> > task_get_classid() retrieves an incorrect classid since it references the
> > current task's context rather than the skb's originating task.
> >
> > Given that dev_queue_xmit() always executes with bh disabled, we can safely
> > use in_softirq() instead of in_serving_softirq() to properly identify the
> > softirq context and obtain the correct classid.
> >
>
> nit: you are no longer using in_softirq() in v2, you should update the
> commit message as well.

Oh, my bad.
I will update it.

-- 
Regards
Yafang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ