[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230508183324.020f3ec7@kernel.org>
Date: Mon, 8 May 2023 18:33:24 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Peilin Ye <yepeilin.cs@...il.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
Peilin Ye <peilin.ye@...edance.com>,
Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.r.fastabend@...el.com>,
Vlad Buslov <vladbu@...lanox.com>,
Pedro Tammela <pctammela@...atatu.com>,
Hillf Danton <hdanton@...a.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, Cong Wang <cong.wang@...edance.com>
Subject: Re: [PATCH net 6/6] net/sched: qdisc_destroy() old ingress and
clsact Qdiscs before grafting
On Fri, 5 May 2023 17:16:10 -0700 Peilin Ye wrote:
> Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), then
> adds a flower filter X to A.
>
> Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
> b2) to replace A, then adds a flower filter Y to B.
>
> Thread 1 A's refcnt Thread 2
> RTM_NEWQDISC (A, RTNL-locked)
> qdisc_create(A) 1
> qdisc_graft(A) 9
>
> RTM_NEWTFILTER (X, RTNL-lockless)
> __tcf_qdisc_find(A) 10
> tcf_chain0_head_change(A)
> mini_qdisc_pair_swap(A) (1st)
> |
> | RTM_NEWQDISC (B, RTNL-locked)
> RCU 2 qdisc_graft(B)
> | 1 notify_and_destroy(A)
> |
> tcf_block_release(A) 0 RTM_NEWTFILTER (Y, RTNL-lockless)
> qdisc_destroy(A) tcf_chain0_head_change(B)
> tcf_chain0_head_change_cb_del(A) mini_qdisc_pair_swap(B) (2nd)
> mini_qdisc_pair_swap(A) (3rd) |
> ... ...
>
> Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to its
> mini Qdisc, b1. Then, A calls mini_qdisc_pair_swap() again during
> ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress packets
> on eth0 will not find filter Y in sch_handle_ingress().
>
> This is just one of the possible consequences of concurrently accessing
> net_device::miniq_{in,e}gress pointers. The point is clear, however:
> B's first call to mini_qdisc_pair_swap() should take place after A's
> last call, in qdisc_destroy().
Great analysis, thanks for squashing this bug.
Have you considered creating a fix more localized to the miniq
implementation? It seems that having per-device miniq pointers is
incompatible with using reference counted objects. So miniq is
a more natural place to solve the problem. Otherwise workarounds
in the core keep piling up (here qdisc_graft()).
Can we replace the rcu_assign_pointer in (3rd) with a cmpxchg()?
If active qdisc is neither a1 nor a2 we should leave the dev state
alone.
Powered by blists - more mailing lists